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This is a prepublication draft of a survey and analysis of 
the basic perceptual determinants that may affect viewers' 
responses to television as a unique mode of visual stimula- 
tion. As the first attempt of its kind, it must undoubtedly 
be incomplete, and may contain errors of detail or emphasis, 
but it should provide a foundation that can be filled in 
further, expanded, and revised with relative ease. It is 
more difficult to assess the practical importance of many 

of the factors that are theoretically relevant to the percep- 
tion of video displays, and areas of research have been sug- 
gested where the need for such research has seemed most 
apparent to us. 
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FOREWORD 


The Television Laboratory at WNET/13 was formed in early 
1972 to research and develop the aesthetic and technological 
potential of the television medium. Since its beginnings, 
the Lab has been supported by grants from The Rockefeller 
Foundation and the New York State Council on the Arts, with 
special project support coming from the National Endowment 
for the Arts. | 


Several years ago, Dr. John Knowles, President of The Rocke- 
feller Foundation, watched a man experience an epileptic 
seizure which appeared to have been induced directly by the 
"roll" of his television set. Subsequently, Dr. Knowles 
encouraged the Lab to extend its research into the area of 
perception and physiology in an effort to shed new light on 
the medium as a unique mode of visual stimulation. 


In 1973, the Lab commissioned Dr. Julian Hochberg, Chairman 
of the Psychology Department at Columbia University, to be- 
gin major research in that area. The result, "The Perception 
of Television Displays," written by Dr. Hochberg and his 
associate Dr. Virginia Brooks, is the first known attempt to 
survey the mass of individual-related research conducted 
throughout the years, and to analyze that research. 


The paper, as Dr. Hochberg states, is "as a first attempt, 
undoubtedly incomplete and may contain errors of detail or 
emphasis; but it should provide a foundation that can be 
filled in further, expanded, and revised with relative ease." 
"The Perception of Television Displays" is rich in informa- 
tion. At this point in time, in lieu of rapid technological 
development, it is difficult to assess the practical value 
of each point. However, without minimizing the importance 
of all the information, there are several points covered 
which seem particularly noteworthy. 


Video displays (pictures) have particular characteristics 
which make them different from cinema and other forms of 
visual displays. Video displays flicker 60 times per sec- 
ond with each "field." The first field is composed of the 
odd-numbered ''scan lines"; the second is composed of the 
even-numbered scan lines. A ''frame'’ consists of two fields; 








the complete picture of 525 lines occurs 30 times per sec- 

ond (in the American system). The size of home television 
receivers and the distance at which people sit from them 
results in the stimulation of the retina over a smaller 

area compared to stimulation from conventional movies, for 
instance. These characteristics have interesting conse- 
quences (both favorable and unfavorable) when also considering 
the structure and functions of the human visual system from 
eye to brain. 


One of those consequences is the apparent ability of "flicker" 
(i.e., lightness changes in the display) to induce, among 
other physiological responses, epileptic seizures in those 

few individuals who are particularly sensitive to this type 

of stimulation. Perhaps even more important than this, Drs. 
Hochberg and Brooks also ascertained that epileptic sensi- 
tivity to flicker-\nduced seizures can be extinguished or 
greatly reduced by "teaching" those affected to "unlearn" 
their sensitivity. 


The authors also state that certain measures of brain func- 
tion, such as alpha-rhythms, can be employed effectively as 
indicators of attention to video images. And also, the speed 
and accuracy with which text is read on the video screen can 
be increased by, for instance, controlling the pictorial 
images which accompany it. 


Of particular interest to video artists may be the informa- 
tion relating to the physiological and psychological effects 
of different cutting rates (editing rates) and techniques and 
their relationship to the limited display size and detail of 
the television receiver; the text also covers points regard- 
ing the acuity factors that affect visibility of details in 
the display, and the effects of moire patterns produced by 

the interaction of the scan raster with certain other patterns 
(such as stripes). 


Throughout the work, the authors have indicated the great 
need for more research in specific areas and have also out- 
lined procedures for research in many instances, For ex- 
ample, little is known about the possible undesirable 
responses of normal viewers to pictures which induce repeti- 
tive eye movements; specific effects on the visuomotor system 
when viewing the world through such a small "window"; viewing 
distances as related to age groups and socioeconomic strata; 


and the effects of synthetic visual surfaces, volumes, and 








edges which can now be effectively created by computers 
and other related equipment. 


The need for more research is undeniably apparent if more 
is to be learned about the effects the television medium 
can have on generations of people. 


The Television Laboratory is proud to have had the opportu- 
nity to sponsor this major step toward further understanding 
of the television medium as a scientific field of study. We 
extend our deepest thanks to the authors and hope that ''The 
Perception of Television Displays" will prove a successful 
foundation for gaining new and more valuable insights. 


David R. Loxton, Director 
The Television Laboratory 
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INTRODUCTION 


The simplest prescription for making a picture that is 
generally recognizable is to make a surrogate display that 
presents to the eye exactly the same light distribution as 
does the scene that is to be depicted. Fidelity is the 
degree of similarity between the light distributions pre- 
sented by the scene and the surrogate. The various photo- 
graphic media are designed to produce surrogates auto- 
matically, with different kinds of departure from fidelity 
in each case. Cinema (meaning motion pictures on film) and 
video (meaning electronically transmitted and stored visual 
displays, as in TV) are produced by a stroboscopic (uninter- 
rupted) sequence of still pictures. Both cinema and video 
draw on perceptual mechanisms that are only slightly under- 
stood. Stroboscopic techniques (cinema and video) are 
capable of more fidelity than are still pictures, in the 
sense that the former permit motion and movement parallax 
(the strongest depth cue, with the possible exception of 
.binocular stereopsis) to be presented. Two facts associated 
with video displays can cause major departures from fidelity, 
however. In both cinema and video, the camera can change its 
viewpoint while the screen remains fixed. In addition, video 
displays characteristically further dissect each individual 
picture in the stroboscopic sequence by means of a relatively 
coarse, repetitive scanning raster (i.e., into thin horizon- 
tal bands that vary in luminance from one horizontal point 

to the next). In this paper, we consider primarily the per- 
ceptual consequence of these facts that are specific to the 
video media. As we shall see, video displays have some very 
peculiar features indeed, about whose consequences we know 
relatively little. (This does not imply that video displays 
comprise a learned, ''visual language," consisting of assem- 
blages of arbitrary symbols, and that the artist and engin- 
eer can therefore use whatever techniques they can devise and 
desire. Instead, it is most likely that the perception of 
these displays, even though they are quite unnatural, rest 
heavily on mechanisms evolved or learned in the course of 
perceiving the natural world, and that any additional depar- 
ture from fidelity that the video artist may wish to employ 
should be undertaken with concern about possible losses of 
intelligibility. ) 








We consider 1) the factors that are associated with the 
stroboscopic nature of the display (i.e., repeated inter- 
ruption, and its potential consequences, both desirable and 
undesirable); 2) factors that are peculiar to video display 
contours (largely to do with the raster used in scanning each 
frame); 3) the acuity factors that affect the visibility of 
details in the display; and 4) the consequences that limited 
display-size and display-detail may have for the applications 
of cutting technique to the presentation of scenes larger 
than the video screen. Cutting technique per se (as it con- 
tributes to the communication of meaning), and the principles 
that govern piecemeal perception of scenes and events in 
general, are only touched on lightly in this paper. 
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(1) Effects of interrupted (stroboscopic) presentation 


(1.1) Frame rate and detectible interruption 


Stroboscopic interruption is detectible in several dif- 
ferent ways, few of which have been separately studied. Such 
interruption may cause a change in apparent lightness; notice- 
able discontinuities or jerkiness in any movement that is being 
depicted; an "activity," busyness or other abnormal vividness. 
Most research has been directed to the detection of lightness, 
or flicker. 


(i211) Factors affecting flicker-detection 


Most sensory (and electrophysiological) flicker studies 
have varied the luminance of a relatively small (e.g., 2°) 
sharply bounded patch of light (see Fig. 1), either in an on/ 
off pattern, in a sine wave, or in a combination of frequen- 
cies. Thus, a simple contour (between the patch and its back- 
ground) has been involved in such studies, an immense number 
of which have been performed (for recent reviews, see 152, 111, 
145, 171). But there have been very few studies of flicker as 
a function of the contour's shape (cf. 112), and no studies 
(that we know of) of the interruption or "raggedness" of de- 
picted movement (cf. 1.4.1, below, for a related phenomenon). 


ite ee oe Frame and scan rates are probably above CFF 


for most normal conditions and viewers 


Nor current theoretical model can fully describe the 
visual system's behavior, even with the simple cases studied, 
but there is a great deal of empirical data available. 


CFF is the abbreviation for Critical Flicker Frequency, 
the threshold frequency above which the interruptions are not 
reliably detected. In general, the following facts are prob- 
ably generalizable: 


(a) CEFF varies as the log stimulus amplitude: i.e., 
with a greater light/dark ratio, flicker can be detected by 
the viewer at higher frequencies. (b) Above CFF (i.e., where 
flicker can no longer be detected), the apparent brightness of 
the screen approximates that of a steady stimulus of same av- 
erage Luminance. 


Somewhat less generalizable are the following statements: 








Detection is most sensitive in the 5-10 Hz range, 
falling off sharply at higher frequencies (61, 60, 239). The 
use of a large, uniformly-flickering field (65°), with blurred 
as opposed to sharp edges, raises the upper cut-off of detect- 
ibility slightly, and decreases the sensitivity to low fre- 
quencies more markedly, thus sharpening the curve that graphs 
the probability of flicker detection against frequency to a 
peak near 20 Hz (154). The TV display is produced by a rapidly 
moving spot, of variable brightness, that moves in a horizontal 
Sweep across the screen, and then traces another line across 
the screen, and so on, until the screen has been filled. This 
happens twice for each still picture or frame that comprises 
the stroboscopic sequence, leaving a space between alternate 
lines on the first set of sweeps, and filling in the remaining 
Spaces on the second set. Thus, although the dot has traversed 
the screen 60 times per second (i.e., with a 60 Hz rate), the 
picture can only change 30 times each second. Since detect- 
ibility at high and low frequencies are differently affected 
by the size of the area, and by the number and sharpness of 
edges in the visual field (153), and because 20 Hz is a peak 
in the detection curve rather than an absolute limit, it is 
not automatically safe for us to consider the 30 Hz frame rate 
and the 60 Hz half-raster rate of TV as being completely above 
CFF. Moreover, the mutual masking that we would expect to 
occur between alternate lines of the interlaced raster should 
complicate the picture theoretically, and the fact that CFF 
varies with location (and that the CFF of a flickering target 
region is affected by the interruption rate of the surrounds 
(172), even if the latter regions are below CFF) on the retina 
complicates matters still more. To this last point: If we 
consider that a TV screen as normally viewed subtends an area 
of about 15° (and a motion picture screen subtends an area of 
about 45° - see Figs. 2a,b), we can extract rough approxima- 
tions of the CFF at various parts of the eye from existing 
data on peripheral sensitivity to flicker (14): 27 Hz at the 
fovea (the center of the retina); 19 Hz at 15° from the center; 
and 14 Hz at 40 from the center. Most of the TV screen should 
therefore fall within a zone having a 19-27 Hz CFF. Although 
only experiments can answer the question with assurance, how- 
ever, it does seem plausible that the 60 Hz rate with which 
adjacent lines of the interlaced raster of video displays are 
presented, taken together with the fact that (relatively) slow- 
decay phosphors are used in TV picture tubes, will put the 
scanning rate past CFF under normal viewing conditions. 


But this does not mean that no effects of stroboscopic 
interruption are to be expected, for 3 reasons: 
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(1) The interruptions might be undetectible as a 
flickering brightness-change, yet might still produce physio- 
logical effects. This is not a very plausible possibility 
for two reasons: (a) Because flicker detection appears to 
have peripheral (but not photochemical) bases (92) so that if 
the interruption has any effects, it should produce flicker; 
(b) It would seem reasonable to guess that for flicker to 
have an effect, it must be consciously detectible by the 
viewer, because it appears that if we mask a flickering light 
by adding a steady light, the flicker loses its ability to 
elicit convulsive seizures (see 1.2.2.3, below). | 


(2) There obviously are many sources of flicker in 
video displays, beside the frame and raster rates, that are 
well within CFF (especially in sets that are not in perfect 
adjustment), and we do not know their prevalence or mix. 


(3) As experimental video techniques (and particu- 
larly, computer -generated displays) are applied to the TV 
medium, and as higher bright/dark ratios are obtained, we 
can expect detectible flicker to increase (1.5.2, below). 


Research is therefore needed on the conditions of 
flicker-detection with stroboscopic displays in the frequency 
ranges to be expected in TV display, with special concern for 
effects on physiological responses, on affective responses, 
and on patterns of eye movement. 


C1122) Stroboscopic enhancement of apparent 
brightness 


Stroboscopic flicker has been shown to affect the con- 
spicuity of a light (i.e., its power to attract and/or hold 
"attention,'' measured mostly by such not-very-satisfactory 
indices as reaction time, apparent brightness, etc.). 

Bartley (21, 22) believes the enhancement to be a central (as 
opposed to peripheral) effect, due to "driving" cortical alpha 
waves (about which, more in 1.2.2, below), a not terribly 
likely explanation. But enhancement does appear to be maximal 
at about 10 Hz (1, 9, 18, 20, 21, 22, 170), which means that 
pulses in this range should have good eye-catching ability - 
if they can be tolerated, which they may not be (see section 
1.2.1, below). 


This statement about enhancement cannot be taken as fully 
established however: for one thing, Bartley found enhancement 
with reasonably bright light (ca 150 mL), and an actual decre- 





ment with dim light, but there are Russion reports (111, 

pp. 52f.) to the effect that the flashing light's advantage 
holds only in the threshold region (i.e., for dim lights). 
(But the balance of evidence seems to support Bartley.) More- 
over, a TV screen has most of the characteristics that are 
normally good eye-catchers (1.3, below), in any case. In 
general, as we might expect from the CFF data cited above, 
whereas the fovea can resolve time-differences of approxi- 
mately 5 milliseconds between adjacent patches of pulsed 
light, peripheral thresholds are much higher; but the percep- 
tion of movement is reported to be more precise in the periphery 
than in the fovea (230), so that the moving contours that are 
normally present on the TV screen should in themselves provide 
good conspicuity in the sense that they are readily picked up 
in peripheral vision; see also 1.3 below. And because the TV 
screen is probably already an eye-catcher, flicker-enhancement 
may not be able to raise its conspicuity appreciably in this 
regard. 


(1.2) Affective and physiological effects of strobo- 
scopic lights 


(1.2.1) Response to stroboscopic displays: motion 
pictures and flashing lights 


A very few studies have examined broad questions about 
the effects of film, such factors as the following: the 
effects on motility during sleep (30, 32); effects on GSR and 
heart rate while viewing film, as a function of reported sub- 
jects' reported interest (33, 34); effects on body temperature 
(as an indicator of muscular tension; 35). GSR scores measured 
during the viewing of an abstract film were found to correlate 
significantly with viewers' ACE (intelligence test) scores (36), 
and motion pictures have been used as stimuli for arousal in 
the study of physiological indices of emotion. These are rel- 
atively gross measures. Fine-grain frame-by-frame monitoring 
of GSR (resistance of the skin to electrical current), cardio- 
tachmetric, pupillary reflex, etc., responses to motion picture 
or video displays, are now feasible, and await only some the- 
oretical reason to be undertaken: gathered on an unguided, 
exploratory basis, such data can swamp the collector past 
relief. We here note that a detailed description of apparatus 
and procedure for recording frame-by-frame GSR responses has 
been published (173) and the fact that some subjects responded 
to a flash of light within 100-200 milli-seconds after a stin- 
ulus, that is much faster than is usually found with GSRs (177). 








The increase was followed by an equally rapid decrease (this 
much is not surprising: rise times and fall times are cor- 
related in GSRs of the more usual variety, (cf. 177). If 
substantiated, this rapid GSR response may be of interest in 
the study of stroboscopic presentations, and particularly in 
relation to maintaining the visual arousal that results from 
optimal cutting rate (cf. 4.1, below). In any case, further 
inquiry here requires specific theories to guide it. We 

shall mention some of these later on. Of greater interest 

now than either alpha or short-latency GST activity would 
seem to be the study of CNV (Contingent Negative Variation), 
or E-wave (expectancy-wave). This is a slow rise in surface 
electronegativity of the frontal cortex (246), that seems at 
present to be the closest correlate of attention, expectancy 
(122, 159, 246, 182) and of arousal or preference (59). Pupil- 
lary response also represents a similar cluster of affective 
processes (47, 119, 120, 148, 149). The advantage of the 
latter is that it requires no electrodes to be attached to the 
subject, but it possesses its own methodological problems which 
may make it impossible to use (102) in practice. Perhaps car- 
diac deceleration, which has been theoretically linked to CNV 
(167), might supplement or substitute for either CNV or the 
pupillary response. But in any case, it would be important to 
study how one or more of these measures respond to repetitive 
view-changes, time-locked to those changes as the latter are 
varied in rate and abruptness: i.e., to present a display in 
which some stimulus change occurs repetitively in a system- 
atic fashion, and to average over many such repetitions the 
time course of the physiological responses after the stimulus 
change occurrences (see Fig. 3 and section 4.1). 


(1.2.2) Electro-Cortical responses to stroboscopic 
displays 


Although much research has been done on brain activity 
as an index of attention (cf. 72 for a review) and of percep- 
tual processing (cf. 199), the work that is closest to our 
present interest has been concerned primarily with alpha wave 
activity. We discuss (1) the general hypothesis that has made 
these waves of special interest in connection with video dis- 
plays; (2) its supposed relationship to stroboscopic lights; 
and (3) the specific phenomenon of photically-driven convul- 
sions in epileptic viewers. 


(1.2.2.1) Alpha rhythm and perceptual sampling 
hypotheses 








Once we consider the perception of successive events, 
and particularly, the perception of events whose temporal 
distribution have some periodic properties (e.g., the 30 Hz 
frame rate), the issue of perceptual sampling rate arises: 
How rapidly in time are events resolved (i.e., separately 
detected), and with what independence? The most extreme 


answer to this question is the perceptual moment theory. 


This was probably first used in one of G. K. Chester- 
ton's "Father Brown" stories as a general idea. More spe- 
cifically, Stroud (226) suggested that we can perceive events 
only in a series of discrete "moments" (of about 100 milli- 
seconds duration), and that the order (and successiveness or 
non-simultaneity) of any two successive events that fall within 
the same moment would therefore not be perceived. We cannot 
ignore Stroud's suggestion: A temporal "grain size" of 100 
msec would be large, compared to the 30 Hz frame rate of TV, 
so it is not a proposal that we can dismiss simply as being on 
the wrong time scale (i.e., as we could so ignore it if, for 
example, the perceptual moment were only about, say, 5 msec, 
which would be far too short to interact with the periodicity 
of the TV frame rate). 


There are, in fact, some phenomena (e.g., subjects' 
estimates of the "numerosity" of rapidly repeated lights or 
sounds, the distribution of reaction times, the speeds at which 
Subjects perceive successive events as being simultaneously pre- 
sented, etc.) that suggest that a moment of 100 msec is indeed 
in operation (259, 50, 49, 257, 258, 88), and there is some 
reported relationship between subjects' performance in regard 
to those phenomena, on the one hand, and the same subjects' 
alpha rhythm rates. Alpha waves are bioelectrical fluctua- 
tions of about 100 msec period, 50 microvolts in magnitude, 
and an only roughly sinusoidal form, that can be measured under 
certain "no-load" conditions - notably in inattention, and, 
especially, when no purposive eye movements are being made (or, 
perhaps, when no eye movements are being planned). These waves 
have been proposed to be a sort of "read-out scan" of percep- 
tual content, (analogous to the way in which a TV camera scans 
a scene), so the temptation of relating them to something Like 
the perceptual moment theory is evident. (In a proposal that 
is very close to Stroud's, Kristoffersen suggests that there 
is a minimum time needed to switch attention from one object 
to another: in fact, he infers a switching time of 50 msec 
from the analysis of experimental data on subjects' abilities 
to detect that two stimuli were presented successively rather 








than simultaneously, and he finds that each individual sub- 
ject's switching time is quite close to the half-period of his 
individual alpha rhythm (165, 166). 


But there are many problems in the way of accepting these 
arguments and data (53); there are many other sources of perio- 
dicity besides alpha waves (259, 168); and there are other 
plausible models of perceptual sampling that do not require so 
drastic an assumption as a discrete perceptual moment. (A 
particularly interesting one for video purposes is D. Allport's 
"traveling moment" model, which is a "window" in time that moves 
forward continuously, rather than discretely; this model cor- 
rectly predicts how periodic displays within the domain of those 
used in TV (particularly, within the domain of animated and con- 
puter-generated productions) are perceived, whereas the discrete 
perceptual moment model does not.) 


What this line of inquiry does suggest to us, however, 
is that there are basic cutting-rate limits that should be 
investigated, and that it pays to experiment with repetitive 
displays, particularly in the 50 and 100 msec ranges, and to be 
alert to special perceptual effects that arise under those con- 
ditions. And not only perceptual effects can be expected: 


A particularly interesting experiment, performed in line 
of research discussed above, was an attempt (partially success- 
ful) to change subjects’ alpha rhythms and to measure whether 
their reaction times (which have previously been shown to be 
correlated with alpha rates: 229) change accordingly. Alpha 
was changed by what is known as "photic driving," i.e., by 
presenting lights that flashed at some rate close to, but dif- 
ferent than, the subject's own alpha rate. Very few subjects’ 
alpha rhythms could be changed, and then only by a small amount; 
in those subjects expected reaction-time effects were obtained. 
But the possible consequences of the fact that photic driving 
occurred in even small amounts and for some people, should be 
considered separately: 


(1.2.2.2) Alpha driving and perceptual enhancement 


Surwillo's attempt at changing alpha rhythm by photic 
driving (229) is part of a varied set of attempts to use stro- 
boscopic lights to change or to "lock" brain functions (247, 
228,121); in James Bond types of fantasy, photic driving has 
been used as torture or proposed as weapon (see, by coinci- 
dence, Ithaca Journal, Wed., July 25, 1973, p.18). As we 
noted above (1.1.1.2), Bartley proposed that brightness 
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enhancement occurs with flickering lights in the 10 Hz range 
because the viewers' alpha waves are driven. However, Kohn and 
Slisbury (161) measured the EEG at frequencies in which bright- 
ness enhancement occurred, and found no relationship between 
frequency-specific activity of the EEG and brightness enhance- 
ment. Nevertheless, photic driving of alpha (and other modes 
of enhancing alpha: cf. 150) does indeed occur, with effects 
ranging from drowsiness (2) and "mediativeness" (150) to 

nausea although one major study found no adverse reactions 
obtained with "normal" subjects after "long" (approx. 2 hours) 
exposures to lights that were varied in frequency from 5 to 

15 Hz(2). 


The nature of the brightness enhancement remains to be 
determined, and other explanations have been offered (e.g., in 
terms of the growth of inhibitory processes, etc.). The ques- 
tion of "driving" repetitive physiological processes by the use 
of repetitive sensory stimuli remains of interest in under- 
standing cinema and video, however, and one well-documented 
danger must be considered: that of inducing epileptic convul- 
sions. 


(1.2.2.3) Stroboscopic triggering of epileptic seizures 


As we have noted, something that looks like alpha driving 
does indeed seem to be possible to some extent. But it is hard 
to see how such driving could be the direct mechanism by which 
epileptic seizures are triggered by flickering lights when we 
consider the frequencies (see below) that will trigger the 
seizures. The flicker rates produced by TV are definitely 
within the range in which epileptic convulsions are induced 
(e.g., when poor adjustments produce "flopover," danger appears 
to be particularly high, perhaps more so because of the strong 
vertical component). Thus, using EEG indices of seizure, For- 
ster et al (81-86) established the sensitivity ranges of their 
subjects to run from between 1.5-30 Hz for some patients to 
between 15-45 Hz for others; parts of these ranges are surely 
beyond alpha rate. Moreover, Forster and his colleagues have 
shown three things that may make their report particularly 
important to the video media in general: 


(a) Epileptic sensitivity to flicker-induced seizure 
can be extinguished (or greatly reduced) by (among other things) 
making the flicker unnoticeable by adding brighter nonflickering 
light to "mask" the flicker and then gradually lowering the non- 
flickering light's internsity until, after some relatively small 
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number of such conditioning trials, the original unmasked 
flicker no longer induces convulsions. 


(b) The extinction described above could be conditioned 
to an auditory click, which could then be administered to the 
(discharged) patient automatically by photocell-activated eye- 
glasses whenever flickering light in the range to which he had 
been sensitive was encountered. 


(c) The subject's sensitivity remains decreased mainly 
in the region of the frequency that was treated, leaving un- 
affected the more distant parts of the range of frequencies to 
which he is sensitive. | 


All of this sounds sufficiently important that one would 
want replication before accepting these reports as reliable. 
Only the conditionability as such has been replicated: (82, 87, 
86, 83, 81, 85, 84). 


To these phenomena, indicating that seizures can be con- 
ditioned and extinguished, add the following: the evidence that 
the subject's preparation to execute eyemovements is speci- 
fically involved in blocking and eliciting alpha (62, 187); 
that alpha (and alpha-blocking) is readily conditionable (30, 
254, 72p. 137, 136); that eyemovements are affected both by 
flicer and by certain kinds of patterns (190), and that epilep- 
tic individuals have been reported who were specifically sen- 
sitive to certain visual patterns, but not to the Lambda 
activity produced by scanning those patterns (260, 30) (that is, 
not sensitive to the brain waves that are produced by the visual 
response per se: 91). Taken together, these facts suggest to 
us that it is via repetitive eyemovements (or via the efferent 
programs by which such eyemovements are planned and executed, 
perhaps brain-stem initiated and elicitable by eyeclosure cf. 
25, 62, 104, 187, 197, 251), that the stroboscopic light trig- 
gers epileptic seizures. This consideration in turn suggests 
to us that normals will be unaffected, in any way that re- 
sembles the epileptic patient's response, by such repetitive 
stimuli. But it also suggests a new danger for the epileptic: 


There is good reason to believe that when one looks at 
a moving object or surface through a relatively small aperture, 
repetitive eyemovements are made (10, 100). When a long pan 
or dolly is shown on a TV screen, a very similar stimulus situa- 
tion obtains to such aperture viewing, because the screen 
itself is essentially a small window, or aperture. This is 
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particularly true in cheaper forms of animation in which much 
of the action consists of pulling a scene past the camera eye. 
Such displays (particularly those that would induce vertical 
movements or components) may therefore also be potential trig- 
gers for convulsion, directly or by conditioning, even if no 
flicker is present. We know of no data to this point. 


More research is needed in the area, both in relation 
to the above suggestion, and with regard to the general ques- 
tion of what normal (non-clinical) viewers experience when 
they are shown photic stimulation that induces either repetitive 
eyemovements or strong EEG responses. Time-locked computer 
analyses can now be undertaken, for relatively small invest- 
ments, in which stimulus periodicity, eyemovements, or some 
conjoint measure of the two provides the baseline for averaging 
evoked cortical potentials. The fact that convulsion-sensitiv- 
ity to flicker can be conditioned (and hence can be extinguished) 
in susceptible epileptics is both a hopeful sign, and a source 
of danger in view of the widespread exposure to video displays 
from cradle on. 


(1.2.3) Other physiological responses to repetitive 
and stroboscopic stimulation  - 


In addition to brain waves, other physiological effects 
may be expected to occur in response to the repetitive aspects 
of the TV display (in the 10 Hz range) which may either be 
desirable, unavoidable or both; and they may occur in response 
to view-changes (cutting-rate), which may have to be particu- 
larly rapid in the video media in general and on the home TV 
screen in particular (see pp. 57f.), probably in the 10 - 0.2 
Hz range. 


(1.2.3.1) Conditioning of responses to repetitive 


stimuli 


Conditioning of physiological responses has been demon- 
strated mostly for operant responses in recent years (108, 115, 
109) but Pavlovian conditioning occurs as well. The fact that 
physiological responses appear to be subject to pervasive con- 
ditioning; that some physiological responses are known to 
occur in the naive organism in response to sensory change (16/7, 
p. 211); that the physiological response systems have ample 
opportunity to become "signals" for each other (167, 109), and 
that therefore the sensory events that originally elicited those 
responses can likewise become signals for each other, and hence 
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can become conditioned stimuli for responses with which they 
had not otherwise been connected; and the fact that the great 
number of such “conditioning trials" that would occur in the 
course of a single hour's watching of TV would provide the op- 
portunity for "overlearning" to occur (e.g., 10 x 60 x 60, or 
36000 times in an hour for a 10 Hz event) -- these considera- 
tions lead us to expect to find that each viewer has a mix of 
physiological responses that have become conditioned to various 
stimulus-repetition rates. Questions of timing become crit- 
ical -- we cannot expect such effects throughout the spectrum 
of repetitive stimulation. (Moreover, change of rate probably 
also becomes important, in the sense that habituation and ex- 
tinction can be expected if the time at which the next event 
will occur becomes well learned (see pp. 7, 51), so that "pace 
changing'"' seems likely to be an essential part of such inquiry.) 


The method by which to study such effects now seems 
accessible: the time-locking procedure that we mentioned 
above in connection with CNV (p. 7), in which each stimulus 
change becomes the starting point from which a given physio- 
logical response is measured, and the sequence is averaged 
over many cycles at that repetition frequency, bringing out 
the characteristic sequence of events and reducing the acciden- 
tal "noise" (Fig. 3). 


We know that an increase in GSR has been reported to 
occur with short latency in response to sensory change (see 
p. 7); if the GSR declines again with the same speed (100-200 
msec) with which it increases, stimulus changes at approx- 
imately 0.5 Hz should keep the response maximal. With CNV, 
heart deceleration, etc., the time course is much more complex, 
and their interactions with each other need be considered. A 
prime research project here would be to investigate the effects 
of change-rates in the neighborhood of 0.2 cps (ca. pulse rate) 
as a recruiting stimulus, with an aim of bringing cardiac ac- 
‘telerations and decelerations under the control of the video 
display. Periodicity alone is, of course, insufficient to 
define what will maintain and elicit response: GSR, for 
example, varies with the physical characteristics of the 
stimulus and of the change in stimulation as purely sensory 
factors (e.g., size, brightness, etc.). But these factors 
may, to some unknown extent, themselves be the expression of 
(or at least be affected by) cognitive expectations about what 
the stimulus will be like (e.g., unexpectedness and "adaptation 
level"; see also pp. 15, 51). And, of course, GSRs are a func- 
tion of emotion, of preference and of arousal: emotion-pro- 
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ducing words and pictures were among the earliest and the most 
persistent stimuli to which the GSR was obtained in psycholog- 
ical research. With repeated presentation to the same stimulus, 
habituation occurs, and the stimulus ceases to elicit the GSR. 
We know of no systematic study of GSR as a function of flicker. 
Nevertheless, the following seems plausible: At reasonably 

slow rates of flicker, each onset and offset should affect GSR, 
and, because the rate is slow, partial recovery from habitua- 
tion can occur between flashes. Habituation to the entire cycle 
will probably eventually occur, but much more slowly than it 
would occur with a continuous light of the same duration. But 
at faster rates of flicker, the stimulus changes should occur 
too rapidly for recovery from habituation to occur between 
flashes, and therefore habituation to these faster rates should 
be similar to that which occurs with continuous (non-flickering) 
stimulation. Regardless of the initial maximal GSR levels, 
therefore, there should be some range of flicker in which habit- 
uation should be slowest to build up, and in which arousal should 
be maintained longest. 


If we assume that it takes arousal level 100 msec to 200 
msec to return to baseline, using the extraordinarily low- 
latency response reported on page 7 as our measure of arousal, 
then the range in which flicker should be maximally arousing 
(other factors equal) should be about 5 to 10 Hz: point that 
may be of considerable importance (see 4.1, below), and towards 
which research should be directed. Changes in rates should 
also be effective, and so should a shift in what site or part 
of the retina is being stimulated by a given flicker rate. 

Such site-change can either be directly produced by changing 
the interruption rate separately in different parts of display, 
or might be produced indirectly by causing the viewer to look 
at a different part of the display (e.g., by introducing a 
fixation-catching stimulus which, by causing him to move his 
fovea to that point in the field, will bring some other part 
of the retina to some desired part of the display). We don't 
know that this has been tried at all, either as an artistic 
experiment, or as a research tool, but ‘it sounds like a poten- 
tially effective way of avoiding habituation. 





(1.2.4) Effects of attribution and labeling 


According to an increasingly popular view, 'emotions" 
and "affect" are not physiological states: They are names or 
labels that the subject applies to a perceived situation. 
That situation includes the state of his physiological indi- 
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cators. If these appear to be in a clearly abnormal state, 

the subject must explain that fact to himself. For example, 
Valins et al (244, 242, 241) have shown that, by changing the 
rate of what subjects thought (erroneously) were their own 
amplified heartbeats, while they watched some pinup-photos and 
not others, that the judged attractiveness of those pinups in- . 
creased. Presumably, the process can be summarized as follows: 
"If my heart rate changes when I see this girl, it must be 
because she is more attractive that the others."' This line of 
analysis, which was initiated largely by Schacter and his col- 
leagues (89, 91) and applied to a wide variety of motivational 
situations (88, 92) (and which is a revitalization of the old 
James-Lange theory of emotion) is burgeoning today as an im- 
portant part of "attribution theory" (144) in Social Psychology. 
We need not take seriously the apparent implication that a con- 
scious decision-making mental process intervenes between stim- 
ulus and emotion: perception is full of examples of what look 
like judgmental or inference-like processes, but are completely 
"unconscious" and better expressed as "contingent responses." 
And it may also be that genuine changes in physiological state 
occur in response to the change in auditory stimulation in 
Valins' experiment (cf. p. 12, above; also,45 ) or, alterna- 
tively, that the change in the simulated heartbeat stimulation 
changes the degree and the nature of the viewer's attention 
(243). But the rubric of "attribution" appears to be good 
enough at this stage for us to look for similar phenomena in 
the area of response to repetitive stimulation, with which we 
are concerned here (although attribution theory's most obvious 
applications lie at the upper levels of communications theory, 
in the interpretation of portrayed motives, intentions and 
interpersonal behaviors -- i.e., in the as-yet untouched 
research area of the perceptual analyses of charisma, "pres- 
ence" and acting). If we do attempt to apply the Schacter- 
Valins model here, the recruited and conditioned physiological 
indices would seem a good starting place. And even if the 
latter are not reliable phenomena, the use of spooky music in 
Scary scenes, the use of sudden noise to increase the dramatic 
effect of the acted content -- these traditional devices may 
merely be clumsily worked-out precursors of Valin's experiment. 


This area demands research with the highest priority. 


(1.2.5) Cognitive vs. sensory "surprise" 


Change in stimulation (and unexpectedness of change) thus 
may affect emotion and attention in fundamental ways, regardless 
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of the substantive (story) content of what is being portrayed. 
But such factors cannot be approached in a simple, mechanical 
manner: As we note below “unexpectedness" or "change™ are 
cognitive events that are in part separable from purely sen- 
sory changes, in that an abrupt sensory change may occur 
without a change in meaningful content (e.g., a sudden bright- 
ness change; or a perfectly comprehensible viewpoint-change 
within the same scene) -- and vice versa (cf. pp. 38, 56). In 
our normal commerce with the visual world such separation be- 
tween sensory and cognitive change occurs rarely if ever, but 
as we shall see it is very easy to separate the two levels of 
change in cinema and video. Factors that affect the viewer's 
comprehension of (and cognitive expectations about) a sequence 
of pictures must therefore be taken into account separately in 
dealing with video (and with motion pictures in general, al- 
though there are special considerations that apply to video 
displays), and it is very likely that such factors as optimal 
timing, habituation, duration of effect, etc., are not the same 
for those physiological responses that are elicited by the cog- 
nitively unexpected or incomprehensible changes as they are 
for those physiological responses that are elicited by purely 
sensory changes. We will consider possible interactions be- 
tween these two domains of factors in 4.1 below. But we 
approach this problem also when we consider the possible effects 
of regional flicker on comprehensibility, which we do in sec- 
tion 1.5 below. 


(1.3) "Eye-catching" effects (peripheral conspicuity) 
and eye deployment 


(1.3.1) The functional division of the eye and its 
relation to eye movement 


We should note that the eye is spatially differentiated 
into two functionally separable regions, and that the field of 
view must also be so analysed because these regions respond 
differently to one and the same stimulus pattern. The fovea 
(the center of the retina), that is highest in acuity and most 
complete in color-receptors, and which is normally considered 
to be of 2° - 4° in extent, is used for detail vision. The 
periphery, the remainder of the field of view, has much lower 
acuity (acuity falls off very rapidly outside the fovea: 212, 
157), has much more in the way of interactions between adjacent 
regions (i.e., the induced contrast between dark and light re- 
gions appears to extend over wider ranges in the periphery: 243), 
and has only partial color sensitivity (the retina is divided 
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into zones that are decreasingly sensitive to full color dif- 
ferences as we go into the periphery, 34). These differences 
between fovea and periphery are not noticeable in our normal 
perceptions because the eye moves with great ease to bring 
whatever one is next interested in looking at from the peri- 
phery to the fovea: The very impulse to know what a given 
object in peripheral vision looks like brings it to foveal 
vision. Because the fovea is so small compared to the total 
visual field the normal perceptual process requires the eye to 
sample the field by bringing successive parts of it to the fovea. 
It does this, not by a systematic and invariant scanning raster, 
but by using information received in the periphery to guide the 
succession of saccades (the information-gathering glances) that 
are made until the viewer has received enough information about 
the display to satisfy whatever perceptual-cognitive task he 

has set for himself (or that the visual display has set for him). 
Such saccades are ballistic in nature: i.e., the eye movement 
1s preprogrammed to fixate some point in the periphery, so the 
limitations of peripheral vision, and perceptual habits of using 
it, are the primary determinants of visual information pickup. 
Although the periphery is so important to visual perception, we 
know little of what and how it makes its contribution, and of the 
special requirements of TV, which uses the periphery in a spe- 
cial way: The normal field of view is about 180°; the motion 
picture theater presents a field of view that varies very widely, 
but one of about 25° - 45° seems a reasonable expected value; 
whereas the video field of view (assuming a screen of say 1.5 
feet, and a viewing distance of about 6 to 12 feet) may be 

about 7° - 14°. (We really need information very badly about 
what viewing distances we can expect to encounter with what 
Screen sizes; and with what age of viewer and socio-economic 
status, since these will affect this figure, and the possible 
consequences of different viewing distances recur throughout 
this survey -- cf. pp. 29, 32, 59), We don't know much about 
the behavior of the visuomotor system when viewing the world 
through such a small "window" as a 7° - 14° display, or smaller. 
There is some reason to believe that the eye does move around 

a great deal, even within such a small confined area (a point 

to which we return later, p. 55ff.), but it is surely not safe 
to assume that it does so in the same way that it does in free 
gaze or even that it behaves as it does in viewing projected 
motion picture films (cf. p. 59). In any case, it is clear 

that a great deal of what the periphery normally contributes to 
help guide the gaze, and that keeps at least the gross outlines 
and relative location of an object in view even after the fovea 
has stopped looking directly at it, is missing from normal 
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video viewing. We must expect that there will be differences 
in whatever perceptual functions would be affected by these 
facts (cf. pp. 58€£.). 


(1.3.2) Peripheral conspicuity 


Although CFF is even lower for peripheral than it is for 
central vision (see p. 4), flicker rates produced by poor TV 
set adjustment, by edge effects, and by special effects (cf. 
1.4, 1.5, below) should be well within detectable flicker range. 
Studies of conspicuity (cf. p. 5) have mostly investigated such 
factors as the time it takes to pick out a flashing light from 
among steady ones (or from among other flashing lights). We 
need research on the extent to which a TV screen, viewed peri- 
pherally, forces fixation to occur as a consequence of flicker, 
and how central that forced fixation is. (And we should note 
that 15° is a very small part of the effective visual world, 
so that if chance were the only determinant of whether or not 
we look at the TV screen it should usually be well outside of 
central vision.) In any case, even without flicker, the TV 
screen (with its internal luminosity and movement) fulfills 
most of the prescriptions for catching the eye that are listed 
by advertising psychologists (cf. 114), so that it might be 
that there is little to be done to a display that will addition- 
ally enhance its conspicuity. Those advertising prescriptions, 
however, are based largely on the old Cornell work on attensity 
(reviewed in 44), and on recall studies which test whether sub- 
jects have attended to advertisements by how well they remember 
them, a very dubious measure for our purposes, and may not 
therefore be directly applicable to this question. A concealed 
VIR camera (optically sited at the center of the video display 
by half-silvered mirror, or other beam-splitter) should permit 
sampling of when a free observer, with no constraints on pos- 
Cure or movement, looks toward the screen. 


About 1° accuracy is all such a situation would require. 
Sampling of the VIR tape would permit us to assess the degree 
to which actual gaze-capturing is affected by various display 
factors. Presumably, the greater the peripheral conspicuity 
of the display, the higher the proportion of time that the dis- 
play is fixated, regardless of its substantive interest. How- 
ever, we have no knowledge at present of habituation and fatigue 
factors, nor of the degree to which peripheral conspicuity may 
be counterproductive (because of irritating or optically dis- 
appointing effects when the peripherally-viewed displays are 
brought to foveal vision). Research would be easy to implement 
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and inexpensive in apparatus, but might be expensive with 
respect to any large-scale reduction and analysis of data. 


(1.4) Interactions with eyemovements 


Even well above CFF, stroboscopic interruption should 
interact with eyemovements, in ways that make such displays de- 
tectably different from uninterrupted ones, and that may give 
them special significance in more disturbing ways. 


(1.4.1) The effects of persistence 


The motion and continuity that are perceived in strobo- 
scopic pictures (which are successive static displays, alter- 
nating with short dark periods), are, of course, "illusions," 

a fact that we tend to overlook because of the familiarity of 
the medium. We do not know the visual mechanisms by which that 
"illusion" is achieved. "Visual persistence" is often invoked 
as an explanation, but it will not do: Although persistence 
might indeed by used to account for the fact that the dark 
periods are not separately detected as such at high enough 
rates (i.e., at rates high enough so that flicker will not be 
seen with alternating, black fields), persistence cannot account 
for the integration of successive displaced contours into smooth 
movement. Indeed, all that persistence would do is to super- 
impose the successive frames to form an incomprehensible multiple 
exposure. That is, with stationary eyes, persistence (which 

is real, and important, and particularly important to any 
eventual understanding of flash frame effects (cf. 3.2.2, below) 
and to an understanding of why the following kinds of inter- 
ference do not normally occur) would merely produce adjacent 
multiple images. With moving eyes, that fact becomes clear: 

If the eyes move relative to a stroboscopic (interrupted) dis- 
play, when the eye has stopped the afterimages of each point 

at which the light impinged on the retina will have left a 

track of persistent points separated by gaps that correspond 

to the times at which the light was interrupted. Thus, a 
moving luminous point that appears to the stationary eye to be 
in smooth motion will be "dissected" into a set of static points 
if the eye moves (cf. 38, 67, 158, 240). Some form of inhibi- 
tion must normally suppress these multiple images. (There is 
some period of suppression of vision before and during a saccade. 
The extent and basis of that suppression is not clear, however, 
(176, 185, 186, 225, 137, 245); it may not apply to smooth pur- 
suit movements at all (225); and we are here discussing after- 
images which persist after the eyemovement has stopped, and 
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the eye has come to rest, in any case -- a point often over- 
looked in connection with this issue.) If we knew the distance 
past which the inhibitive effect (if it holds in a moving eye) 
breaks down, and if we know the velocity of the eye, and the 
interruption-rate of the display, we should be able to say 
precisely the sizes and distance at which the dots would appear. 
The effect is somewhat like an uncontrolled version of McLaren's 
Pas de deux, if the field is relatively uncluttered so that each 
afterimage is not overlaid by contours produced by other objects. 
Whereas a set of contours that overlay each other (for example, 
the gridwork of the raster pattern) will, if they are regular 
(or if they have any regular component, as appears to be true 

of "snow"! -- cf., McKay) cause successive moire patterns, which 
will themselves appear to move. All of this is something Like 
"picket fencing" under strobe lights, except that it is dependent 
on the eyemovements of the observer. Computer-generated dis- 
plays should be particularly prone to such effects: The 
representation of rapidly moving objects, that are displayed 
with high-contrast on otherwise-empty fields, which induce 
tracking movements of the eye, should be avoided if this form 

of visual "beats" is to be minimized. More difficult to avoid 
is a similar effect, on a smaller scale, that we discuss in 

2.1 and 2.4, below. 


(1.4.2.) Interference with oculomotor coordination 


Saccadic movements are preprogrammed rapid excursions of 
the eye that bring some point in peripheral vision to the fovea 
(p. 17). Although the excursion itself is very rapid (ca. 50 
msec), a minimum of about 150-200 msec preparation time is 
needed before the saccade can be made, and therefore saccades 
cannot be executed with a frequency greater than 4-5 Hz. If 
unrelated visual stimuli are flashed before the eye at 4-12 Hz, 
normal saccades stop occurring and the eye appears to freeze 
(52, 203) presumably because insufficient time is then provided 
for planning the next saccade (and for assimilating the informa- 
tion of the previous one). In fact, if we assume that an inter- 
ruption or discontinuous change in scene will immobilize the 
eye for 100-200 msec above normal 150-200 msec response time, 

a possible explanation of the photic driving phenomenon (p. 9) 
suggests itself, without recourse to brain-wave hypotheses. 
Within the range of 10-30 Hz, let us first consider the lower 
frequencies: Each time the eye prepares to move, it may re- 
ceive a signal that the scene has been interrupted (because the 
light has come on and gone off), and the intended saccade is 
aborted in accordance with the movement-freeze hypothesis dis- 
cussed above. Consider next the higher frequencies: Whenever 
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an eyemovement is undertaken, and the eye has moved, it will 
have received a new view from its new position, but with a 
dark period intervening, and both views will be superimposed 
because of the persistence (i.e., afterimages of both views 
will be seen). There is reason to believe that the eye does 
not immediately take its own shift into account for some 

period after shifting (possibly up to 500 msec: 180); there 

is reason to believe that a continuous visual framework is 
necessary for the saccadic "suppression" to occur (43); and, 

in any case, if the eye's movement straddles the dark period, 
it will confront multiple images (cf. 1.4.1, above) and have 

no way to distinguish the final image from the ones previously 
received. If the visual system makes a wrong choice about which 
one is currently being received (and therefore about which way 
the eye is currently looking, inasmuch as its nonvisual sources 
of information about where the eye is pointed may be inopera- 
tive for some time after the movement has occurred), the con- 
fusion would only be increased by the time the next saccade is 
executed. Severe interference with the normal coordination of 
eyemovements, and with the integration of successive inputs into 
a single unified scene, should then result. 


In short, stroboscopic lights in the 10-30 Hz range, having 
detectable and substantial dark times between the flashes, may 
affect purposive eyemovements similarly to the way in which DAF 
(Delayed Audio Feedback) is known to disrupt purposive speech. 
And this interference should occur whether or not there is any 
photic driving effect on brain waves. To the uninformed and 
unpracticed subject, the results might well be disorientation, 
dizziness, nausea and worse (including effects on middle-ear 
function). 


Research should be relatively easy and inexpensive in 
this area. For one thing, the disruptive effects of interrup- 
tion in this range should be reduced or eliminated by producing 
a stable, nonflickering surround (which may at least in part 
account for the amelioration of epileptogenic seizure-driving 
described by Forster et al in Section 1.2.2.3, above). 


(1.5) Masking, lustre and new "surface colors" sub- 
jective colors 


(1.5.1) Masking 
The range of succession times employed in stroboscopic 


video displays is within that in which a host of poorly under- 
stood phenomena, variously called masking, metacontrast, 
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contour-assimilation, etc., may occur. In these phenomena, 
shapes or patterns (or areas of color) that would have been per- 
fectly visible had they not been preceded or followed by some 
other display, fail to be perceived. Such masking phenomena 
may turn out to be undesirable side effects in artificially- 
assembled video displays. Recent reviews and papers (146, 181, 
202, 252) reveal a complex array of findings, not amenable to 
a general summary. At this stage, all that can be offered is 
a warning that sequentially-produced scenes, especially those 
produced in alternational on the same or adjacent areas, may 
result in perceptual blanking of some of them, and that dis- 
plays that are built up from frame to frame (e.g., stop-action 
and flash frames) or that vary from frame to frame, should be 
carefully inspected with this warning in mind. 


Closely related to at least some of the masking phenomena 
are those that occur with equal-flux and "lustrous" displays, 
which we consider next. 


(1.5.2) Lustre and uniform (ambiguous) lightness 
displays 


When lights are flashed in close temporal and spatial 
contiguity, a number of unusual phenomena occur. In particular, 
if a given pattern is presented on the same site in reversed 
luminance distributions (e.g., a black pattern on a white 
ground, which alternates with its photographic negative), it is 
perceived as a lustrous, pearly white or gray ground, both fig- 
ure and ground being very vibrant and compelling. 


This phenomenon does not look totally unfamiliar: In 
binocular vision, a similar simultaneous presentation (and, 
perhaps, an alternation of dark and light via binocular 
rivalry) occurs when one looks at a specular (glossy) surface, 
since such surfaces characteristically reflect a highlight to 
one eye and not to the other eye at the same point on the ob- 
ject. The same thing occurs in miniature at each of the very 
small facets that make up a lustrous or irridescent surface 
(e.g., mother-of-pearl; brushed aluminum; etc.) in normal bin- 
ocular viewing. Thus the lustrousness that appears in response 
to an alternating negative-positive display may occur because 
such displays simulate the binocular rivalry that occurs during 
binocular vision of lustrous surfaces. (And although this has 
never been studied, a similar effect also may occur, with head 
tremors, to produce the same effect in monocular vision. We 
should also note that brightness-enhancement is known to occur 
at the edge or contour of a flickering field (207, 208), ina 
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manner that seems closely related to the enhancement discussed 
in 1.1.1.2, above; this brightness enhancement effect is far 
more pronounced when the two eyes receive different luminance 
levels (207).) It may thus not be a really "new" surface color 
that we see with such alternative video displays. But it is 
new insofar as controlled presentations are concerned, .and we 
have much to learn about how to produce the effect, and about 
what its conSequences are. 


As far as production of the phenomenon is concerned, the 
"temporal modules" open to the normal video display (the units 
in terms of which sequences can be constructed) are based on a 
minimum full-raster cycle of 30 msec. We do not know what the 
required times for alternative negatives and positives should 
be, nor do we know (as we shall see) whether the same phenomenon 
can be achieved by alternating a pattern with a blank (pattern- 
less) field of the proper luminance and timing;. we do not know 
whether the second field can have a different pattern on it, 
nor whether the two fields can consist of dot-matrices in nega- 
tive-positive alternation (perhaps with some phase difference 
from one part of the field to the next) with the contours being 
produced between those sets of dots that co-vary. (Such dis- 
plays, if they produce visible pictures at all should be 
strikingly "active" and irridescent.) Most of the detailed 
research we have on the effects of cyclical temporal luminance 
changes on surface-color has been directed at two problems: 
one is the attempt to discover the conditions for generating 
what is called "subjuctive color" (i.e., hues that appear when 
only sequences of black and white or light and dark are actually 
being presented; see 78 for a recent theory and review). The 
other is the important study by Sperling, 91, which has not 
been really followed up, on the luminance and temporal condi- 
tions under which a pattern of light, alone, cycling at about 
500 msec, followed by a black field, will produce one of the 
following: either the positive image of the pattern is seen; 
both a positive image and its negative afterimage are seen to 
alternate; a negative image appears alone (which is what most 
interested Sperling); or an image of ambiguous brightness- 
distribution appears, which sounds very much like the Lustrous 
condition we are discussing in this section. Sperling reports 
that the relationship between the conditions (of luminance and 
timing) that produce these alternative percepts is invariant 
over a variety of patterns (although the actual luminance and 
time values may be different with different patterns). This 
seems to us to be an extremely important area of future re- 
search, particularly in view of the possible consequences of 
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the use of displays that alternate within these ranges. Other 
consequences are considered in 1.2, 2.4, and 2.5 One concern 
we must have, of course, is the possibility that these displays 
are epileptogenic. 


(2) Video contours 


The world contains objects' surfaces, edges and corners, 
not outlines nor contours, although it is by the use of the 
latter that those surfaces, edges and corners must be repre- 
sented in pictorial displays: Outlines and contours ina 
picture serve to mark off the two-dimensional projection of an 
object, in the scene being portrayed, from the rest of the 
visual field. In a high-fidelity surrogate (e.g., a good 
photograph), the contours are produced by changes in texture- 
density; by abrupt differences in surface color, or in surface 
texture; or by cast shadows. In drawings, which are extremely 
low-fidelity surrogates, objects' edges and corners are "repre- 
sented" by luminance-difference contours or by lines (which are 
doubled luminance-difference contours, very close together). 
Although outlines are nothing at all like objects' edges, we 
are so familiar with their use that it is hard to remember how 
different from edges they are; but although they are so dif- 
ferent, they probably share certain essential stimulus features 
with objects’ edges, and are not merely arbitrarily-learned 
symbols (129, 130). Video, as we shall see (3, below) is prob- 
ably a relatively low-fidelity medium for most things other 
than closeups (even though it can approximate the very high 
fidelity of photography, for certain combinations of camera 
distance, subject, screen-size and viewer-distance), and con- 
puter graphics certainly rely heavily on lines and outlines. 
Most of what we know about contours, we have learned in the 
study of luminance-difference contours. But in addition to 
these, there are varieties of contours encountered in video 
displays that are rarely encountered elsewhere, whose per- 
ceptual properties are largely unknown. We first discuss 
luminance-difference contours briefly, then the other varieties. 


(2.1) Luminance-difference contours 


A prerequisite for seeing shapes, objects, distances or 
anything substantial, is the existence of luminance-difference 
contours in the visual field. (E.g., if 2 regions differ only 
in their hue, and are equal in brightness, the contour that 
separates the regions does not in general have a clearly dis- 
cernable shape (176; cf. however 31)). Moreover, the luminance 
must change abruptly (204). Because of this need for sharp 
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change, if we defocus the image, or blur it in any other way, 

or increase the incident light to both of the regions that are 
bounded by the contour, the detectability of the contour (69, 
193, 194, 195, 235) is decreased, and the apparent brightness 
of the stimulus also decreases (although some of these effects 
may disappear at short exposures: 132). Thus a sharp focus, 
reduced incident light falling on the video tube face, increased 
contrast (and any electronic amplification of the abruptness of 
the change in luminance gradient each side of the contour) -- 
all of these increase the apparent brightness of the objects, 
and increase the visibility of their contours, on the TV screen. 


But a Luminance-change may be well above the degree of 
abruptness that is normally sufficient for a contour to be seen, 
and yet the contour may not be perceived: Luminance-difference 
contours mask or inhibit each other when they are presented 
successively with intervals between their onsets that range up 
to about 150 msec (255), with a maximum masking effect between 
them at about 50 msec (20 Hz) reported by some (13, 163, 181, 
253). Neither is an abrupt luminance change necessary in order 
to see a contour, as we will see in connection with subjective 
contours (2.3, below), except in this sense: even subjective 
contours depend on groupings or arrangements of elements (e.g., 
dots) that are themselves bounded by well-defined Luminance- 
difference contours even though those contours do not them- 
selves bound or outline the object being represented. 


So luminance-difference contours are essential to per- 
ception despite these qualifications. And sharpness of focus 
is important to obtaining well-perceived lLuminance-difference 
contours. Nevertheless, the visibility of overall patterns, 
and of larger shapes, are probably actually increased by blur- 
ring (as we shall see in 3.3, below). 


(2.2) Moire and border effects 


When the eye moves rapidly, a stationary point of light 
at which the viewer is looking will stimulate a path or line 
on the retina, rather than a point (p. 19), yet our eyemove- 
ments do not normally cause us to see a tangle of lines -- a 
fact that is usually attributed to suppression of vision 
during eyemovements (see p. 20). Some researchers question 
the degree to which vision is suppressed immediately before 
and during voluntary and involuntary eyemovements (240), but 
in any case the aftereffects of having exposed the same part 
of the retina to two different luminance distributions result 
in partially overlapping "“afterimages."" This should occur with 
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each tremor that the eye makes; it should bypass the suppres- 
sion phenomenon; and it should have a wide variety of effects, 
some of which may be important in video displays. 


(2.2.1) Border effects 


Consider a small eyemovement which causes a luminance- 
difference contour (henceforth abbreviated 1.d.c.) to be dis- 
placed on the retina so that a strip which previously lay on 
the dark side of the contour is now exposed to the bright side. 
The strip that is freshly exposed to the bright side will Look 
brighter than any of the surrounding region, because in effect 
the afterimage will be added to that of the incident light 
itself, This super-bright stripe will move as the eye moves, 
producing a fluctuating brightness pattern of brightness bor- 
dering on, and interacting with the perception of the l.d.c. 
The width of such border effects depends (among other things) 
on the eye tremors. We do not know what the latter are under 
TV-watching conditions. When an observer tries to keep his 
gaze steady under laboratory conditions, a number of different 
kinds of movement occur (7, 57, 190, 205): a high-frequency 
tremor (30-100 Hz, with a median amplitude at about 17 seconds 
of visual angle); slow drifts (1 minute of visual angle per 
second) over about 5 minutes of angle; rapid unperiodic flicks, 
from 1 to 20 minutes of visual angle in extent. The area which 
includes the center of the eye about 50% of the time is about 
10 minutes of visual angle (63), so we are talking about move- 
ments some of which are several times the width of a raster 
line at what we have taken to be the normal viewing distance 
(p. 4). It seems plausible that such tremors will occur with 
video displays at least as much as they do in laboratory condi- 
tions: There is reason to believe that the last two movements 
are corrective in purpose (57, 190), and if that is so, the 
minor perturbation or jitter that is characteristic of video 
(and especially TV) displays might mislead the visual system 
into acting as though drifts have occurred and therefore as 
though corrections are necessary (even though it is the dis- 
play and not the eye that has moved). Alternatively, it might 
be that eyemovements decrease in the presence of jitter. 


Research on these problems will need precision eyemove- 
ment recording procedures, and cannot be undertaken inexpen- 
Sively nor in a casual setting. Perhaps the apparent bright- 
ness of a display could be measured as a function of the degree 
of jitter electronically introduced into the overall display, 
without attempts to measure the effects on tremor directly. 

We should note that the relative movement of the image over 
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the retina is an essential part of the visual process: When 
the retinal image is stabilized by optical methods (involving 
contact lenses), so that there is no visual consequence of 
tremor and small saccades, vision ceases. It is apparently 
the low-frequency components (10Hz and below) that are needed 
to maintain vision, and the higher frequencies merely blur the 
image (164). The normal movement is within the frequency range 
available to, and possibly inherent in, TV displays: So it is 
not implausible on the face of it that the jitter of the video 
display, in this frequency range, should be accepted by the 
visual system as being part of the system's own perturbation 
(an effect which seems more plausible to occur in a dark room 
than in a lit one, and with a large display than a small one). 


(2.2.2) Moire effects 


A moire is produced when a spatially repetitive pattern 
of l.d.c's (luminance-difference contours) is displaced rela- 
tive to its previous position and relative to the afterimage 
that it has left there. This is so because the displacement 
(whether by display-jitter or by eyemovement) results in the 
Superposition of the new stimulus input and the old afterimage, 
slightly displaced relative to each other. The points of inter- 
section between the two displaced sets of contours provide the 
repetitive elements for a new set of lines, which is present in 
neither set considered separately. The pattern formed by the 
new set of lines is the moire pattern (Fig. 4). The castella- 
tions and harmonics produced as side-effects by, say, chroma- 
keying, and the repetitive elements of the raster themselves, 
would seem to provide sufficient basis for us to expect moire 
patterns will be generated in the TV display. These moire 
patterns will move as the eye moves and trembles relative to 
the display, or as the display itself moves slightly as a 
whole. The moire patterns will move even more so (and much 
more than the eye moves) when the display itself contains a 
periodic movement, as is almost inevitable in video. (The 
purely subjective patterns that McKay (184) describes as being 
visible is visual noise or "snow" are probably examples of the 
interaction of eyemovements, afterimages, and nonrandom l.d.c. 
changes in the snow.) 


Video contours (and particularly those produced by special 
effects) therefore can offer space-time 'beats,'' which can be 
of higher contrast than is otherwise available on the video dis- 
play, and whose frequencies can be considerably lower than the 
stroboscopic interruption rates with which video displays are 
actually presented. These "beats" may have several conse- 
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quences -- in physiological and affective effects (see 1.2, 
1.5), in subjective color (note that the conditions at what we 
will call active contours are often right for generating ''sub- 
jective color"), etc. -- consequences that may be of consider- 
able importance, although that importance can only be assessed 
by research: It may turn out after all that these effects are 
basically nondetectable, or, if they are detectable, that the 
perceptual system adapts to them without effects of any impor- 
tance. Research methods here are not obvious. They will 
themselves require further thought. 


The moires and border effects described above should 
occur whenever the subject's fixation is relatively constrained, 
when his eyes are well focused, and when the display luminance 
is high relative to the ambient room illumination. The first | 
two conditions may or may not be met in normal video viewing 
(cf. 3.3 and 4.4, below); the latter probably is usually met. 
Let us consider the consequences of these effects that may 
occur at video contours. 


(2.3) “Active contours" and their possible effects 


Because they depend on interactions between the nature 
of the display, the eyemovements that are executed, and the 
structure of the retina of the eye, the following flicker effects 
depend on the size of the elements in the proximal stimulus pat- 
tern (i.e., on the visual angles subtended at the eye) rather 
than on the distal stimulus object (i.e., the physical size of 
the display itself). For this reason, the vibrancy and contour- 
activity that we will discuss should be a function of the 
viewer's distance from a given display. Furthermore, the fovea 
(or center) of the retina and the periphery of the retina respond 
differently to these factors, if only because of acuity dif- 
ferences (but also because of their different sensitivity 
characteristics), and the extent of the picture that falls on 
each (on fovea and on periphery) will also change with viewing 
distance. As noted in reference to pointillism and subjective 
contours (2.4, below), this means that there may be several 
different "optimal" viewing distances to consider, and to com- 
pare with the actual average distances at which viewers tend to 
sit (when these are known; we do not now know them, nor how they 
differ for different age groups and socio-economic strata). 
Thus active contours should be studied separately under 3 view- 
ing conditions: when they fall in far periphery; when they fall 
in near periphery, and when they fall in central vision. We con- 
sider possible effects of each of these separately, and then 
their interactions. 
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(2.3.1) Far periphery 


There is an old belief to the effect that movement and 
change, peripherally-viewed, evoke reflexive fixation move- 
ments of the eye and head -- i.e., that movement and change 
serve to catch visual attention. Active contours would appear 
to be more attention-catching in this regard than normal static 
ones, but it seems unlikely that they will be effective in the 
far periphery (especially because the poorness of peripheral 
vision is due not only to the way the retina is organized, but 
to poorer optical performance of the eye's optical system in 
the far periphery). But under normal viewing conditions, TV 
displays must be much too small to fall into far periphery 
when the observer is looking at the set, and there usually is 
sufficient gross movement on the screen anyway to attract the 
observer to the screen when he is not looking at it. I.e., 
active contours are probably not particularly important as far 
as extreme peripheral vision is concerned. 


(2.3.2) Near periphery 


If the active contour falls close enough to the fovea 
to be detectably active (and it may turn out that this is true 
within the visual angle subtended by the whole TV screen, given 
normal viewing size and distance), then some or all of three 
consequences may occur: (1) The eye will be drawn to fixate 
that contour more strongly than it would be with an ordinary 
contour (e.g., than it would be attracted to a line on paper). 


(2) The apparent luminance and saturation of the two 
regions that are bounded by an active contour may be much 
higher than would be true if they were bounded by an ordinary 
contour: something of a jewel-like or stained-glass effect 
may occur (heightened by lustre-color, discussed above (1.5)), 
that resembles some of the descriptions of drug-induced visual 
effects (as may other aspects of contour-activity: cf. the next 
two points discussed below). What suggests that such heighten- 
ing of the entire bounded region will occur is the evidence that 
the color of a given region of the visual field is set by the 
contour, and is at least in a part a "construct," filled in 
over the entire area but really determined only by the pro- 
cesses that occur at the contour (cf. 164). This point has to 
be tested by color-matching research using active contours; if 
it is true, it should be of considerable interest in creating 
experimental displays. The research procedures here seem rela- 
tively straightforward. 
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(3) The contour activity may stimulate the same an- 
alyzing mechanisms or receptive fields as are activated by 
real movement of contours in peripheral vision. There is some 
evidence that receptive fields of this sort exist, i.e., that 
our visual nervous system contains analyzing mech nisms that 
are stimulated by moving contours in the retinal image. Such 
contour movements normally occur on the retina for any or all 
of three reasons: 


First, because the eye trembles and shifts continually 
by small amounts (see p. 26). Fixation is normally maintained 


and corrected on the basis of visual information. If the am- 
plified contour-activity is interpreted by the fixation main- 
taining machinery of the visual system as being due to eye 
movement and as a signal that fixation is not being adequately 
maintained, the system may take steps to stop what it inter- 
prets as being eye tremor. Automatic attempts by the visual 
system to stop that tremor will not, of course, stop the 
visual activity at the contour, under these conditions, and 
the attempts to do so may result in increased difficulty of 
maintaining fixation (see also p. 47). That is, it may be 
easier to maintain in a somewhat defocused fixation than to 
maintain perfect, focused fixation, when active contours are 
presented. 


second, movement in the retinal image obviously may 
occur for either of two reasons: because an object moves 
relative to the eye and/or because one object moves relative 
to another object or to the object's background. If active 
contours have the same effect on motion "detectors" as does 
real movement in the retinal image, such contours may indeed 
trigger fixation "reflexes."’ The periphery is particularly 
sensitive to movement, and might be sensitive to the pseudo- 
movement that we are discussing here, as well. 


Third, because active contours may be strongly inter- 
preted as being an edge between surfaces at different dis- 


tances. This point is less evident, but may be of importance: 
With slight head movements, movement parallax (i.e., the dif- 
ference in views from two station points) will produce slight 
movements at the edges of objects where an abrupt change in 
distance occurs (e.g., objects as they are normally distributed 
in space around us: cf. 123). In addition, binocular parallax 
may produce an apparent alternating motion or activity in the 
combined (both eyes') view at an object's edge, in that two 
binocularly viewed contours in close proximity (ca. 15 min. 








of visual angle or less) generate a piecemeal alternation 
that may resemble videocontour activity, and may activate the 
same perceptual mechanisms. 


(4) If the viewer has moved his eye to fixate an active 
contour on a CRT with an "expectation" that movement and/or 
depth-at-an-edge will be found for reasons discussed above, 
that expectation will of course be disconfirmed. That is, when 
the eye fixates an active contour, after having viewed that con- 
tour in near periphery, the absence of consistent parallax and 
movement-information should then become clear to the visual 
system (and our acuity is extremely keen with respect to such 
information, detecting parallax down to about 2 seconds of 
visual angle). With active contours present on the video 
screen, depth and movement may thus be interpreted as being 
present wherever one doesn't look, and absent where one does 
look. (Something of the sort may happen with all pictures, 
particularly with pointillistic pictures and gross-screen 
halftones, but video may provide a more vivid and active form 
of the phenomenon.) These displays may therefore maintain a 
restless visual search activity within the confines of the 
video field (say, over a field of about 15%), quite indepen- 
dent of the search activity that the visuomotor system normally 
undertakes under the press of intrinsic and extrinsic perceptual 
tasks (44, 102, 190). Something much like this presumably is 
achieved, at much greater pains and with much more talent, by 
the use of composition to keep visual search activity (and 
visual interest) alive beyond the intrinsic demands of the 
subject matter or the extrinsic demands of the task that has 
been set the viewer. If we apply attribution theory (see sec- 
tion 1.2.4) to this circumstance, we would expect the viewer to 
consider any material that causes him in this manner to search 
more than he can explain, as being more interesting than he 
would otherwise consider it to be. 


Alternatively, it may be that such disconfirmed visual 
expectations are soon extinguished. Eyemovement research, using 
such displays, is clearly needed. It is particularly important 
to determine whether such a spurious search process does occur; 
whether it persists indefinitely or finally extinguishes; if it 
does extinguish, what kinds of schedules of presentations will 
maintain it; and how the search activity that occurs with active 
contours compares with the search that occurs with compositions 
composed of normal luminance-difference contours. Gross eye- 
movement records would suffice for such study, and can be ob- 
tained unbeknownst to the viewer. 
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(2.3.3) Fovea 


Another possible source of visuomotor conflict remains, 
especially likely when the fovea fixates an active contour. 
The eye can only focus at one distance at one time. Accomoda- 
tion (focus of the eye's lens) is most probably a visually- 
guided action (7; although it is not a well-understood behavior). 
The repetitive "busyness" at active contours may (like certain 
kinds of type-face that are difficult to keep in focus) provide 
cues that lead the system to treat the contours as though they 
are out of focus even when they are perfectly well focused. 

This may produce an accomodative "hunting," or (alternatively) 
it may lead the eyes to defocus down to some compromise level 
at which the 'busyness" is no longer detectable because the blur 
at that level of defocusing masks it. (There is evidence (57) 
that the eye is normally kept somewhat defocused, anyway, but 
we are here proposing a more marked state of mis-accomodation. ) 
Another reason for the defocusing to occur is the conflict that 
may occur between periphery and fovea in terms of tremor sig- 
nals (cf. the discussion of this point above), which can only 
be resolved by defocusing the eye enough to obscure the contour 
activity. (There are still other reasons for the eye to de- 
focus while watching video displays: see 3.3, below.) 


A possibility of further visuomotor peculiarity (and pos- 
sible conflict) then arises from the above consideration as 
follows: The degree of accommodation that the eyes maintain, 
and the degree of binocular convergence that is needed to bring 
an object to single vision (binocular fusion) are normally 
closely coupled to each other in the visual system. If a de- 
focusing of accommodation occurs, that should either entail a 
convergence of the two eyes at some point other than the surface 
at which the viewer is looking, or should entail decoupling of 
accommodation and convergence. It is not generally thought that 
accommodation and convergence are decoupled in normal seeing, 
so it seems most likely that convergence would be deliberately 
out of adjustment, also. 


The effects of contour-active displays on accommodation 
can conveniently be measured by laser scintillation technique 
(17, 118) although probably not without the viewer's knowledge 
that such is being done; convergence changes would require much 
more elaborate methods (and ones which would interfere drastic- 
ally with normal viewing habits, whatever they are). 
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(2.3.4) Possible effects of the above differences 


In one way or another, therefore, watching active-con- 
tour video may elicit visuomotor anomalies or conflicts whose 
effects may range from causing the viewer not to look clearly 
at what he is watching attentively (!), to causing him to main- 
tain an active and repetitive (and perhaps purposeless and 
basically uninterested) scan of a small viewing area. Some 
form of visual disorientation seems likely, and we already 
know that visual disorientations often have affective and 
even emotional consequences: There are the undocumented but 
well-known effects of flickering and dazzling lights used in 
the Electric Circus and "psychedelic light shows," which them- 
selves deserve formal research; there are the effects on orien- 
tation that tilted and moving fields can produce (64, 261); 
there are the hypnagogic and possibly hypnotic effects of the 
Ganzfeld, which is any visual field that is blank and homogen- 
eous enough to prevent accommodative and convergent fixation 
from being maintained (15, 131); and there are the reports that 
defocusing normally accompanies "withdrawal" and fantasy or day- 
dreaming (11), which suggests that a display that elicits de- 
focusing for optical reasons may, by conditioning or circum- 
stance, encourage the fantasy activity. Research on these 
effects would be valuable to undertake, but the research 
methods are not self-evident, and require thought and develop- 
ment. (Note that the heightened color, the active contours, 
the visuomotor conflicts, and the heightened visual attention 
to a single area are also all anecdotally features of may psy- 
choactive drug states. These video phenomena may be or more 
value (perhaps reminiscent) to those who have had experience 
with such states, experiences that were rewarding. It is even 
conceivable, if far-fetched, that the reverse is also true: 
that the early experience with electronic displays are predis- 
posing to later enjoyment of psychoactive drugs which produce 
similar perceptual effects.) 


(2.4) Subjective contours 


We have seen that luminance-difference contours are in 
one sense essential to the perception of objects and scenes -- 
to the perception of anything other than fog. But that does 
not mean that the objects or shapes that we perceive must be 
bounded (or outlined) by Luminance-difference contours: In 
fact, whenever TV is viewed from a distance at which the raster 
is clearly visible, all representation is achieved by means of 
subjective contours -- contours that are not coincident with 


luminance-differences, but_are somehow "filled in" by the 
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"mind's eye." Such contours are not only essential to per- 
ception under those conditions of viewing video: They are 

also uniquely easy to produce in electronically-generated 
displays, so that their characteristics, advantages and dis- 
advantages should be better understood than they are at present. 


(2.461) Completion contours 


The classic forms of subjective contours are: 


(a) Those in which a set of repeating (or, at any rate, 
densely distributed) lines or elements are all interrupted co- 
linearly; the line along which they are occluded or interrupted 
is thus perceived to delineate a superimposed shape, whose im- 
plicit edges appear to occlude the interrupted set of elements 
(cf. 123, p. 431). 


(b) A set of dots placed on a piece of paper usually 
appear to group together forming one shape or another, in ac- 
cordance with so-called principles of Gestalt organization 
(123, p.431). 


(c) More striking than either of these two classical 
examples are the contours that can be produced in random dot- 
fields by parallax: In binocular vision, two displays made 
up of random sets of dots, neither of which, when viewed sep- 
arately, show anything but the random texture, will be seen as 
an object in front of a background if the two displays are 
identical (albeit effectively random), except that a subset of 
one display has been displaced relative to the other (e.g. 
Julexz patterns, reviewed in 143). 


(d) A similar effect can be obtained monocularly if some 
subset of dots is moved with a common vector, relative to the 
rest (127). In both cases, the object that is perceived 
appears to be bounded by a more or less sharp edge, which is 
usually peculiarly vibrant in character, even though the entire 
field is of homogeneous luminance and (except in the sense that 
each dot has a lLuminance-difference contour that makes the dot 
itself visible) no luminance differences are needed in order for 
the object's shape to be perceived. 


Coren (54) has recently proposed that the first kind of 
subjective contour (a, above) is really a case in which the 
visual system has assumed that one surface is interposed in 
front of another, and that the contour that we perceive is a 
perceptual reconstruction of the object's occluded edge. This 
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sounds right. The more general rule may be this: that the 
conditions that lead the visual system to "expect" that the 
edge of an object's surface will be found to lie at a given 
locus of points in the visual field, will elicit the percep- 
tion of a contour at that locus. Contour-perception would thus 
become a special case of the more general perceptual process of 
surface-perception. | 


(2.4.2) Pointilliste pictures, halftone and ''snow!! 


The use of cross-hatching, dots, texture, or small brush- 
strokes to ontain shades of color and lightness is almost as 
old as art itself. The process seems simple enough, at first 
glance: If we make the points of black and white (or of dif- 
ferent colors) so small that the eye cannot resolve them, they 
will be averaged into a single blended area. This indeed hap- 
pens, and is of course responsible for the mixture processes 
that underlie the perception of color in color TV. But this 
cannot possibly be the whole story: The cross-hatching, the 
dots, the raster (or whatever) can be perfectly discernible, 
yet the shades of gray will still be seen: we can detect both 


the shading, and the individual dots on whose blending or av- 


eraging the perception of the shading presumably depends. We 
propose that this process, like that of completion-contours 


discussed above, depends on the two-stage nature of vision, 
that is, on the fact that a vague peripheral glimpse is fol- 
lowed by detailed foveal inspection: When a set of dots (or 
cross-hatching, etc.) falls in peripheral visions, the low 
acuity of the periphery causes the dots to be averaged to form 
what is perceived as an apparently homogeneous area which has 
a true luminance-difference contour to set it off from other 
peripherally-viewed areas that have a different average lLumi- 
nance, etc. When the eye moves so as to bring the same set of 
dots to the fovea, however, the dots are seen as the group of 
discrete elements they are, if the observer is sufficiently 
near the display to resolve them. From the right viewing dis- 
tance, therefore, such displays should have this particularly 
vibrant and dynamic property: contours, edges and surfaces 
alternately form in peripheral vision, and dissolve into indi- 
vidual points at the center of. the fovea as the viewer moves 
his eyes about. The vibrancy, and the existence of an optimal 
viewing distance, has been noted in the case of pointilliste 
and impressionist painting (cf. Grosser, 105; Taylor, 233), 
although we believe that the reasons have not hitherto been 
clearly understood. 








The same principle should be operative either when poor 
reception imposes a blanket of visual noise (snow) on the TV 
screen, or when the viewer sits so close to the screen that he 
can discern the individual "lines" of the raster in foveal 
viewing: recognizable contours and shapes should be clearly 
visible in peripheral vision (see also 3.3, below), whereas in 
a small area at the center of gaze (an area whose size depends 
on the viewer's distance and on the size of the TV tube), those 
shapes should dissolve into bands or points of color. And there 
should therefore be an optimal viewing distance to obtain such 
effects, if the consequences of such pointillistic vibrancy are 
desirable. 


(2.4.3) Subjective surfaces and edges which are 
particularly accessible to video technology 


Viewing distances are probably great enough in most cases, 
and reception good enough, that neither raster nor snow is an 
important cause of subjective contour -- i.e., it seems plaus- 
ible that many viewers sit so that true, uninterrupted lLuminance- 
difference contours delineate objects' shapes, even to foveal 
viewing. But there are resources available to video technology 
that make it particularly easy to generate" synthetic surfaces" 
whose edges are subjective contours. 


There are two electronic ways of generating such "syn- 
thetic surfaces" that are readily available to video program- 
ming (and therefore also motion pictures) that cannot be 
achieved at all in other media. These are (1) textures and 
patterns of elements whose distribution and movements are de- 
signed and executéd by computer; and (2) rapid and pre-progran- 
med changes (and reversals) of light and dark (e.g., from 
negative to positive) and of complementary colors. We know 
a little about each of these exotic stimulus conditions, but 
not enough to comprise a well understood palette for the video 
artist to use, 


(2.4.3.1)  Dot-defined surfaces and volumes 


The visual system can extract what is common and ordered 
about the distribution of a large number of separate elements 
(like points of light, the bits of a texture, etc.). Examples 
here: Small lights that are worn at actors’ joints as the actors 
shake hands, embrace, walk around in an otherwise totally dark 
room are sorted out by the visual system into the perception of 
a group of behaving people, and are not simply perceived as a 
wild array of moving dots (140); the extraction of surface 
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planes (93, 140, 141, 142, 190) and volumes of space (cf. the © 
logo for Startrek) out of a sea of moving points of light, etc. 
Some of the principles that determine and limit this extrac- 
tion procedure for static displays have been studied under the 
heading of grouping (123, 160, 256), and a very little experi- 
mental data have been gathered (23, 24; Attneave, 12). More 
important are the dynamic grouping factors, and what looks 
like a powerful theoretical analysis and some preliminary 
experiments have been undertaken but have not yet been pur- 
sued very far (32, 139, 140). It is probably worthwhile to 
try to apply these rules to computer graphic programming. 


It is now possible to program display-generating com- 
puters to depict volumes, surfaces, etc., in three dimensions 
(e.g., by the flow-patterns of elements or dots that result 
from relative motion of observer and scene -- i.e., by dolly 
and tracking shots). It should also already by possible, at 
our present st te of knowledge, to program such computers so 
as to choose just those patterns that will most readily be 
correctly perceived by the viewer, and to avoid those views 
or configurations which will be most difficult to perceive as 
the filmmaker desires them to be seen. In effect, this means 
that the computer can not only carry out the laborious work 
of rotating and dollying with respect to scenes and objects, 
once it is programmed to do the job -- it can also pick out 
the most effective ways of doing so. Research here is not 
difficult in principle (although it is probably quite expen- 
Sive, to the degree that it depends on the services of com- 
mercial computer-graphics companies for exploratory and trial- 
and-error work). The research questions here are (a) To what 
degree can we now effectively design synthetic visual surfaces, 
volumes and edges; (b) What are the properties of the contours 
that are thus created (see 2.5, below). | 


(2.4.3.2) Surfaces constructed of alternating colors, 
which are particularly easy to produce by electronic means, 


are discussed as perceptual effects in 1.5, above. 


(2.4.4) Subjective contours and their probable effects 
on comprehension 


When our gaze shifts from one object to another while we 
are looking at the real world, we know where one object lies 
relative to the other because (1) we ourselves have moved our 
gaze, and because (2) the first object may remain in peri- 
pheral vision when we regard the second (and vice versa). When 
it is the camera that has shifted its direction, of course, the 
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first of these sources of information is unavailable, and if 
there is any overlap between two successive views that can 

serve to provide peripheral indication of the relative direc- 
tions in which the two views lie, the overlap must then be an 
important factor in comprehending the scene that is being de 
picted by the successive shots. Although we do not have direct 
information to this point, it seems very likely that the patterns 
of light and dark, in peripheral vision, are important to the 
comprehension of successive views. 


In the case of equal-flux displays, in which Lustrous 
objects are produced on lustrous backgrounds (see 1.5, above); 
in displays in which regions of the visual field differ from 
each other mainly in hue but are of equal lightness; and in 
dot-pattern displays of essentially equal density over the field 
of view: -- in all of these, the viewer has no masses of dif- 
ferent lightness in peripheral vision to enable him to decide 
rapidly where one view lies relative to others. That is, the 
viewer should lose track more readily of where he is when 
views change in equal-flux (or high- or low-key) displays. 
Computer graphics would seem to be particularly prone to this 
source of comprehension slow-down. 


Moreover, it should take longer for the viewer to detect 
when the scene has changed, when such equal-flux displays are 
used, Thus, these displays provide a good example of a separa- 
tion between cognitive change and sensory change (cf. 1.2.3, and 
4.1): That is, the scene may change drastically, but there are 
no lower-level brightness changes in peripheral vision to sig- 
nal the lower levels of the processing system that the scene 
has changed. In these circumstances, the visual system does 
"know'' that a change in view has occurred only after the ob- 
server grasps that the meaning of the new scene is different 
from that of the old. This is a relatively high-level process 
that may take considerable time (e.g., 500 - 1500 msec) and 
should be added to the time that it takes to discover what the 
new scene contains. In such equal-flux displays, it should 
therefore take longer than in formal displays to comprehend a 
new scene, and this should make visual understanding a more 
demanding task. It may be neither desirable nor undesirable, 
in itself, to increase the difficulty of comprehending the 
flow of scenes: As we shall see below, if we want to maintain 
a particular change rate (or cutting rate), and we don't want 
the viewer to grasp each view and tire of it before it is re- 
placed by another view (this is a particularly important goal, 
yet probably difficult to achieve in relatively uncluttered 
montages or in closeups), such equal-flux pictures as we here 
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describe may help maintain comprehension at the desired level 

of slowness. Research is needed on such consequences of using 
flicker, lustre-color, and other approximations to equal-flux 

displays. 


Comprehension time is a rather abstract concept, and a 
method of measuring it is not immediately evident. Because 
the eye appears to "freeze" until a scene is visually compre- 
hended to some minimum degree, the time needed for the eye to 
start to move again after a scene has changed (i.e., eyemove- 
ment latency and frequency), is probably a particularly .prom- 
ising avenue of research on comprehension time. But research 
with other indices of change, such as GSR, would also seem 
worthwhile to explore. 


(3) Acuity factors and information limits 


A TV display can of course provide only a very small 
fraction of the field that the eye itself can look at directly: 
If the viewer sits close to the screen, or if a large set is 
used, so that the display fills a large part of the visual 
field, then the scanning raster is extremely coarse compared 
to what the eye can resolve. Conversely, if the distance from 
the screen (or the screen's size) is such that the scanning 
raster is not much coarser than the eye's resolving power, then 
the size of the screen's image must be very small -- say, about 
5°. These considerations place constraints on what can be 
shown the TV viewer, and how it can be shown. 


(3.1) Measures and determinants of acuity 


Optical resolution can be measured in various ways, some 
(but not all) of which are interchangeable. Video resolution 
sets some limits on the total information transmittable through 
the CTR display, with a particularly rigid form of redundancy 
being set by the horizontal lines of the scanning raster. 
Assume a line width of 1 mm., and a viewing distance of three 
meters, or a visual angle of approximately 1 min. of arc. That 
visual angle is usually taken as the minimum detectable separa- 
tion for the average observer (4.2.1, below), but we shall see 
that from the viewer's standpoint, this measure cannot be 
assigned too mechanically. Perceptual acuity (the psychologi- 
cal counterpart of optical resolution) has various measures for 
different purposes and circumstances, measures which are only 
slightly interchangeable. The most widely used measure is the 
minimum separable: a gap of about 1 minute of visual angle is 
taken as the approximate measure of detectability (e.g., for 
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ac to be distinguishable from an o, the gap in the c must be 
at least 1 minute wide). The TV raster is marginally detect- 
able from about 3 yards, by that measure, and the redundancy 
that this imposes should make letters indistinguishable that 
would otherwise normally be just barely distinguishable to 

direct vision. This statement must immediately be qualified 


however, in several ways. 


(i) Acuity falls off rapidly from the center of the 
retina. The fovea is generally considered to be a region of 
maximum acuity of 2° to 4° in extent, but even within that 
small region, acuity varies (143). If a video display of 40 
cm in horizontal extend is centrally fixated from a distance 
of 3 meters, its horizontal margins fall 4° each side of the 
center of the fovea. Assuming that the eye remains fixated at 
the center of the screen, at the edges of the screen the gap 
in a letter c will have to be much more than 1 minute of arc 
in order for the c to be detectably different from an o. 


(ii) The perceptual machinery does not respond inde- 
pendently to each point in a matrix, or raster, but is affected 
by redundancies in the stimulus display. Thus, the acuity for 
an offset or a "jog" ,in a straight line is only about 2 seconds 
of arc -- much finer than the minimum detectable separation of 
1 minute. A more appropriate measure to apply to video dis- 
plays might be the m.t.f. (or modulation transfer function), 

a Fourier analysis of grating acuities. (Grating acuity is 

a measure of the spacing that a number of lines must have 
before their directions of orientation can be detected (cf. 
4.2.1).) Grating acuity would vary as a function of the 
grating's spacing, its orientation with respect to the raster, 
and the distance from which the set is viewed. This measure 
would be particularly useful with respect to learning the 
limits with which texture-density gradients can be represented 
in TV, a problem to which we refer below (3.2). Under optimal 
conditions, subjects can resolve gratings of lines that are 
about 35 - 40 seconds wide (56), but we know of no research 
applying this measure either to the perception of texture gra- 
dients or to video. Such research is needed, but two precau- 
tions should be noted now: 


(a) Moire between the test grating and raster pattern 
is an artifact which will affect the measure of grating detec- 
tion, and must be separately assessed. (b) We do not know how 
well grating acuity (and the modulation transfer function 
measure to which it is ideally suited) predicts other kinds of 
acuity. In particular, we do not know how well grating acuity 
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measures will predict the detectability of the various patterns 
that are used in graphic communication, since the latter have 
distinctive features some of which are (in terms of these 
measures) at low frequency, some of which are high frequency, 
and some of which are broadband. (A dramatic example of this 
point is discussed in 3.3, below.) 


Research is needed to determine the applicability and 
generalizability of this measure of visual acuity to video 
displays. 


(3.1.1) Increasing acuity by temporal integration 


Because the eye can integrate information over time in 
various ways the acuity limits imposed by the spatial "grain 
size" of the display may be surmounted by drawing on those 
integrative abilities. One way of doing so that has not been 
explored is to treat the raster like a screen or picket fence, 
behind which the scene to be shown moves in some systematic 
manner. As a rough analogy, consider a grainy motion picture 
(or the ''snow" of low signal/noise ratio in a video display): 
In any one frame, the grain imposes a measurable information 
loss; but the grain is randomly distributed and changes from 
frame to frame, whereas the components of the original scene 
itself remain unchanged (or change systematically), so that 
the grain "averages out.'’ It may be that a continuous pan or 
dolly achieves the same effect (in addition to serving other 
and more obvious functions) by moving the details that are 
obscured by the dark lines of the raster in one frame to an 
unobscured location in the next frame (this discussion ignores 
horizontal resolution limits), resulting in a spatial distribu- 
tion whose peaks define the details. One assumption here is 
that the visual system can store up partial detail from view 
to view, even when the views do not fall in registry (if they 
fell in registry, the whole issue would be a trivial one); 
there are recent data (65, 106, 107, 113) that can be inter- 
preted as supporting this assumption. 


(3.1.2) Pattern sensitivities 


The fact that the perceptual system makes use of stimu- 
lus redundancies (cf. 3.1, ii) necessarily implies that it also 
responds to more than one point at a time (i.e., that it responds 
to patterns and not merely to the aggregate of individual, stim- 
ulated points), and therefore, implies the use of sensitivities 
outside of the fovea. 


| 
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What is more, it seems likely that different kinds of 
training result in different sensitivities to patterns (e.g., 
it is possible that near-sighted viewers are sensitive to dif- 
ferent features of objects than are normals; 228). We do not 
know what these pattern-sensitivities are, how far they extend 
over the retina, nor how they differ with age, education and 
purpose of viewing. But we do know at least this: that letters 
(and presumably other shapes as well) can be recognized as pat- 
terns (or as packets of characteristic distinctive features, 
which may not be quite the same thing); that larger familiar 
collections of letters (e.g., familiar words) can be recognized 
as units by use of features that are probably larger than the 
individual letters (i.e., by "word features"; cf. 262). It is 
a reasonable assumption that the shapes of printed letters and 
words have so evolved in use that they make good use of the 
features that are available to the normal visual nervous sys- 
tems, and that they will therefore be good measures of pattern- 
acuity, over and above the more local acuity measures discussed 
in the previous section (i.e., letter legibility should be a 
good measure for pictorial use, as well as for text). Letters 
and words are readily varied as stimuli; they are readily scored 
as responses; and we have a great deal of experience with using 
letters and words as experimental materials, so they are good 
measures to use in the operational sense. What the actual 
features are by which we recognize letters and words is not 
known with any assurance at present; but while it would be 
helpful to know them, if we intend to generalize from reading- 
acuity in video displays to other forms of video acuity, we 
need not await such knowledge in order to use readability as 
a measure of video display characteristics. 


(3.2) Reading as shape perception in video displays 


There are 3 reasons for studying reading in video dis- 
plays: (1) Most important, the recognition of printed materials 
is in many ways a better index of the medium's perceptual ef- 
fects and capacities than the more "objective" tests used to 
measure and define video resolution. (2) As symbols, printed 
text has emotional effects, as well as informative value, that 
are different from spoken words and from pictures, and the. 
video media are in some respects exceptionally well-suited to 
exploit those effects. (3) Text is an important part of the 
video display, whether in overt or covert display (e.g., text 
and subtitles vs. labels on pictured products). 
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(3.2.1)  Legibility studies 


There is a large body of legibility studies, most of 
those through 1963 having been indexed by Cornog and Rose (55) 
and somewhat less extensively by McCormick (183). The former 
provide an index of studies on the legibility of alphanumeric 
characters and other symbols. These are classified in terms 
of the presentation techniques, the typography, the response 
measures, etc., that were used. A few principles are probably 
quite general to the reading process: In skilled reading (as 
opposed to letter-by-letter decoding), responses are made as 
readily and as quickly to familiar words (and even to familiar 
phrases) as to individual letters (262); such responses are 
made on the basis of partial information (e.g., beginnings or 
ends of words; word length; perhaps word-pattern features), 
and on expectations of what the writer is likely to say next, 
etc. Letters are more readily detected than the closed shapes 
that are often used as symbols in electronic displays (200; 
but cf. 116). The higher the familiarity of a word, the lower 
the duration required to recognize it (133, 134, 221): In fact, 
familiarity, frequency, expectability, meaning -- these affect 
all measures of legibility: they affect the speed of recogni- 
tion, the exposure duration required for recogntion, the 
distance at which the word is legible, etc. (236). Upper halves 
of words are more legible than the lower halves (236), lower 
case letters are more legible in the context of meaningful 
words than they are as individual letters (71, 236). And for 
this reason, words can be partially missing or wrongly spelled, 
and that fact will go effectively undetected (the "proof- 
reader's effor"') as long as one reads for meaning and not 
merely to check letters (262). In contrast, meaningless strings 
of letters require much longer exposures to recognize, and un- 
familiar nonsense shapes have to be dealt with piecemeal, each 
identifiable element by element. (Similar phenomena occur in 
hearing as in vision, in that missing phonemes are "heard" as 
long as some noise (a burst of white noise, a cough) is pre- 
sented at the time in which the phonemes should have occurred 
(248, 249); whereas strings of meaningless sounds, even if they 
are as few as four in number and the listener knows what they 
will be, cannot be recognized as to the order in which they 
occurred (250).) And this in turn means that the eye is 
picking up features of the words, other than each specific 
letter of which each word is composed, and must be picking up 
such features outside of the central fovea. Estimates may vary 
of how far outside of the fovea features are picked up, and by 
the nature of the fact that different kinds and levels of cues 
(some of them sentence-wide in nature) are used to determine 
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what response the subject makes, the precise figure cannot be 
fixed. (A useful line of research would be this: to vary the 
number of words-at-a-time in which a message appears, and to 
study the effects of that variable on the recognition of a 
given word in that message. This measure will tell us how much 
is being picked up peripherally in a given context. And in any 
case, we don't know just how many words are needed to achieve 

a decent reading rate on something like a TV display.) But 
regardless of what the precise figure turns out to be, it is 
clear now that word-recognition draws on larger and on more 
peripherally-visible features than are measured by the 1 min. 
figure for visual acuity. What those features are, is not pre- 
sently known. | 


A promising line of inquiry is suggested by what may be 
an important finding by Erdmann and Neal (71): They varied 
the vertical and horizontal resolution of letters, using a digital 
scanning device; they also varied character size and familiar- 
ity. In a previous study (70), they had examined the legibility 
of letters as a function of the first two of these factors. 
Familiar words and familiar individual letters were consistently 
more legible than were unfamiliar words at all sizes, which is 
of course what all other research has led us to expect. But 
their most important finding was this: Words were more legible 
than lower-case letters alone only in the case of familiar 
words at high levels of legibility. For familiar material, the 
average legibility of words was equal to or higher than the 
average legibility of the individual letters; for unfamiliar 
material, word-legibility is less than it is for individual 
letters. But since correlation coefficients were high for all 
words (.93 - .98), it seems that it is still useful to know the 
legibility of the individual letters, since one can use that 
information to predict (approximately) the legibility of words 
that are constructed from those letters. Furthermore, these 
data seem to imply that the effects of degradation on the 
features by which words are recognized are pretty much the 
same as the effects of degradation on the features by which 
individual letters are recognized. 


As far as individual letters' legibilities (and symbol- 
identifiability) in video displays are concerned, 6 lines per 
symbol seem to suffice (155, 156), although Bell (1967) and 
Shurtleff and Owens (216) recommend a minimum of 10 lines per 
character; and the curious fact appears to be true both for non- 
alphanumeric symbols (16, 116) and for alphanumeric symbols (216, 
217) that, in order to achieve a given degree of legibility, 
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decreasing the number of raster lines that compose the image 
may be compensated for by increasing the visual angle that the 
figure subtends, up to about 16 min. of arc: Further decreases 
in raster lines cannot be compensated for by increasing the 
visual angle that the figure subtends. With fixed raster and 
viewing distance, legibility decreases with size, of course. 
Specifically, the x height is the important feature of lower 
case letters, with a minimum of about 1.1 mm., in printed 
text, or about 10 min. of visual angle. Giddings (99), how- 
ever, also finds a decline in legibility as character height 
is increased past an optimum of about 18 min. (e.g., .156 
inches in height at 30 inches of viewing distance), or 14 
raster lines for words, and 21 minutes (.187 in. at 30 in.), 
or 18 raster lines, per digit. 


In print, line length (i.e., the length of the string of 
letters on the page) has a significant effect, independent of 
character size (174). This fact, taken together with Gidding's 
findings we have just cited, and the fact that at any given time 
we are watching a fixed raster, suggests that there is some 
optimal size for TV text: It is not simply a matter of making 
the text as large as possible. | 


(One thing about which we can find no information is the 
effect of "rolling" text up the screen, or across the screen, 
as a way of increasing the textual information that a display 
can hold. Pacing will obviously always be a problem with such 
procedures, as it has been in motion picture usage of this 
technique, but the problem is aggravated in video because the 
much smaller resolution available means that fewer words can 
actually be present at one time, and that the viewer's reading 
rate must therefore be closer to the rate at which the material 
is presented. ) 


As far as measuring the legibility of text is concerned, 
average perception time per line reflects the difficulty of the 
material being read (237). This suggests that a self-adjusting 
procedure, in which the subject himself regulates the rate of 
presentation, could be used to test legibility "online." (A 
payoff matrix would be needed to keep his errors at some fixed 
level: say, 95% accuracy.) This method could be used, in com- 
bination with pictorial material to measure the comprehensi- 


bility of the picture itself by measuring effects on the 
legibility of the text that accompanies it. That is, by 


measuring the degree to which the presence of a particular set 
of pictorial displays increases the speed and/or accuracy with 
which the text is read, we will have measured the extent to 
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which the pictures have contributed to the viewer's correct 
expectations about the text. This is a badly needed line of 
inquiry, since measures of comprehensibility are going to be 
badly needed -- a point to which we return later. 


(3.2.2) "Subliminal" messages and flash frames 


To be illegible does not necessarily mean to be in- 
effective. There was a great flurry of speculation and re- 
search, in the early 1950's, about the possibility of present- 
ing messages (usually words and sentences, but sometimes 
pictures) which would be too fast (or too dim, etc.) for the 
viewer to be aware of consciously, but which would presumably 
be recognized unconsciously, and therefore affect the viewer 
at an emotional and uncritical level (cf. 197). This was 
exciting, theoretically, because it promised to bring psycho- 
analytic inquiry into the laboratory; practically, it seemed 
to offer a means of heightening the impact of entertainment 
(and of selling products and people) that was particularly suited 
to TV, because it could be flashed in a single unnoticed frame; 
that seemed potentially more effective than any normal presenta- 
tion could be, because the presentation is unnoticed; and poten- 
tially more dangerous, because the viewer would not and could 
not bring his critical faculties to bear on the message. 


Research on which this speculative structure was built 
was inadequate in two ways: (1) Some of the reported effects 
were due to response biases (i.e., were due to subjects’ readi- 
ness to report seeing one word rather than another). (2) The 
conception of a subthreshold "unconscious" message depends on 
our assumption that there is a clearcut threshold, above which 
the stimulus is consciously recognizable and below which it is 
not -- and that assumption is false. Because of these methodo- 
logical and conceptual defects, this line of inquiry has been 
pretty well abandoned, although desultory research continues, 
usually with additional negative results (29, 198). 


But baby may have gone with bathwater: The fact is that 
a very brief exposre (e.g., a flash frame) can provide informa- 
tion that will affect the viewer, and in particular will affect 
the way in which he looks at subsequent displays, even if the 
brief presentation is gone before he can fully grasp what it con- 
tained and reexamine it. The shock effects of such interpola- 
tions; the uncertainty about their contents; and the fact that 
the viewer must reconstruct their meaning while he is looking 
at other things, make such flashes of text (or of picture) a 
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unique component of motion pictures and TV (even if they are 
not truly "subliminal" nor "unconscious"), and a reich field 
for research. 


Formal titles have been abandoned in motion pictures, 
but Godard and the underground filmmakers remind us that there 
are still occasions in which one word is worth many pictures. 
With the growth of experimental video techniques, flash-frame 
substitutes of words for pictures may turn out to be accept- 
able and effective. It would seem worthwhile to undertake 
research to determine whether flash-frame interpolation of very 
brief messages or scenes can, after the viewer has learned to 
expect them, act as mise en scene, and to provide anticipation 
of future transitions, regardless of the unfortunate history of 
research on subliminal perception. | 


(3.3) Consequences of texture-information limits and 


distortions 


Regardless of how good the reception may be, the scan- 
ning raster must result in a degraded image, in all viewing 
conditions in which the line width is 1 min. of visual angle 
or more (see p. 39). This degradation imposes limits that have 
not been well explored and that may be more interesting than 
mere "noise" degradation would be; most fascinating is the 
fact that those limits can apparently be overcome to some ex- 
tent: 


(3.3.1) The advantages of defocusing 


A picture that is degraded by breaking it up into homo- 
geneous patches (which of course is what happens along one 
dimension within the width of a raster line) is rendered more 
interpretable under the following conditions: if it is viewed 
at a greater distance; if it is thrown out of focus as a stinm- 
ulus; if it is viewed with somewhat defocused eyes; if it is 
viewed through a diffusing screen, etc. This effect demands 
investigation by anyone interested in video displays. Whether 
it is due to the elimination of the high-frequency components 
that are offered by the sharp edges of the raster of whether, 
as Julesz and Harmon (143) propose, the improvement is due to 
the elimination of most or all of the visual frequencies that 
fall within a range, two octaves in width, that contains the 
frequencies that comprise the essential pictorial elements -- 
in either case, some defocusing of the viewer's accommodation 
would help him see the picture, and would be reinforced. It is 
very important to determine whether in fact the perceptuomotor 
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system maintains a particularly defocused posture when viewing 
of video displays, and what the consequences of a defocused 
posture are if it does so (see p. 32). It would also be import- 
ant to determine whether there are ways of achieving the same 
goal by some other (electronic or optical) method of intervention. 


(3.3.2) Effects of raster on textured displays 


One point at which the characteristic fidelity limits 
of video must be particularly important is in the representation 
of texture, and in the representation of gradients (rates of 
change) of texture density. This is so because of resolution 
limits, and because the scanning raster must interact with any 
regular, repetitive pattern (p. 27). In theory, this should 
make TV a low-fidelity medium indeed, as we shall see, but 
there has been study of the practical consequences of these 
visual limitations. Two possible consequences are considered 
here: 


(3.3.2.1) Effects on surface quality and on spatial 


information 


Objects’ surface qualities are heavily determined by 
their textures (96); these qualities must be poorly conveyed, 
therefore, in static presentation via video display. The same 
must be true of the surfaces' slants and tri-dimensional 
orientations, to the extent that these attributes are conveyed 
to the observer by the means of texture-density gradients (95), 
but the actual efficacy of such static texture-density gradients 
is questionable, in any case (51, 89). With moving surfaces 
(or a moving camera), however, the situation may be quite dif- 
ferent: the visual system may be able to extract the surface's 
texture from the "overlaid" scanning raster, subtracting the 
latter as a constant screen through which the surface texture 
can be discerned as its elements systematically perturb the 
raster. The perceptibility of represented objects’ surface 
texture, their tri-dimensional orientations and their motions 
in space (which are normally revealed by the ways in which the 
textures undergo transformations due to motion perspective -- 
cf. 94) may therefore be restorable, at least in principle, by 
relative motion (i.e., by pans, dollies, etc.). But this will 
not automatically be true: some combinations of texture, motion 
and spatial orientation should interact with the raster to pro- 
duce moire patterns that themselves act as dynamic texture- 
density gradients. These false texture-gradients would be 
either uninformative, at best, or actively misleading (being, 
say, opposite in the slant or direction of motion they imply 
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to the motion and slant of the object that is being represented). 
There is little that can be done to avoid such misleading pat- 
terns when we are photographing real scenes and people, other 
than to avoid textures as much as possible. The advent of com- 
puter graphics, however, makes the use of tailor-make textures 
and of texture-manipulation practical, and research in this 
area seems very worthwhile to undertake. 


(3.35202) Conflict of two- and three-dimensional 
"depth cues" 


The failure of textural information, the omnipresent 
high-contrast border that is produced bytthe frame of the pic- 
ture tube, the strong frontal textrue that is provided by the 
raster and by any visual noise that may be present (inasmuch 
as the latter is probably statistically a uniform texture- 
density gradient of 0.0) -- all of these are indications (or 
"depth cues") to the viewer that the surface is flat. There 
are normally features in the scene being represented on the 
surface of the tube that contain strong depth cues (particular- 
ly, motion parallax) which signal the fact that different parts 
of the scene are at different distances from the viewer. There 
is thus a perceptual conflict between the features of the dis- 
play that lead the viewer to perceive tri-dimensional space, 
on the one hand, and the particularly strong flatness cues of 
video displays, on the other. Such conflict has been variously 
held to provide and maintain visual interest (137); to decrease 
the strength of the geometrical illusions and decrease the 
degree of portrayed depth (103); and to increase the importance 
(and noticeability) of the "ground shapes" -- the shapes between 
the objects. This last factor, if it is true, should presumably 
make the esthetic consequences of pictorial composition and 
graphic design more important in TV than they are in motion pic- 
tures -- a point that seems somewhat implausible on the face of 
it. None of these features have, to our knowledge, been tested 
at all, although it would be important to have information about 
them, and although techniques for research are probably not too 
difficult to design. 


(4) Principles of TV cutting 


In discussing how special video factors effect the 
viewer's comprehension of any change in scene, we first present 
a brief outline of what appear to be the perceptual principles 
that underlie cinematic representation as a general cognitive 
task, and then consider the two variables of cutting rate and 
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comprehension tone as they are specifically affected by factors 
characteristic of TV: i.e., by display size, resolution and 
viewing distance. 


Esthetic considerations aside, this is an important area, 
and one in which reliable and quantitative knowledge is to be 
gained. Aside from a very few studies that are only indirectly 
relevant to cutting and editing (101, 99, 127), there is at 
present practically no research base for the speculations that 
have been offered. We can review what speculation there is 
briefly enough. There are two main areas in which formulations 
specific enough for research purposes have been made: (1) Cut- 
ting rate; (2) Comprehension. 


(4.1) Cutting rate 


There is considerable tradition and some actual research 
to the point that viewers prefer stimuli that are novel (and 
stimuli at some optimal level of complexity) than those to which 
they have been long exposed, that they look longer at such stim- 
uli and that they show more physiological indices of arousal 
and/or of "cognitive load'' in response to novel and/or optimally 
complex stimuli. 'Visual fatigue'’ sets in rapidly (224). Motion 
pictures and TV can change the stimulus, via cutting to maintain 
visual arousal and interest at the desired level. The fact that 
cutting can have beneficial effects is doubly fortunate, because 
TV, with its small and low-resolution display, must change its 
views often for two reasons: It must use montages of many 
views (usually closeups) to assemble larger scenes; and it 
must also change often because the low resolution of the screen 
probably also makes for displays having low information (low 
complexity), which in turn makes for rapidly accruing boredom 
with a given view. These two reasons interact in their effects 
on cutting rate, which itself has at least two determinants: 

(i) The purely sensory components of change per se, and (ii) 
The change in comprehended content. Spottiswoode (224) has 
offered a speculative model of optimal cutting rate which incor- 
porates both determinants. Although Spottiswoode's theory is 
based solely on his introspective observations, it does seem 

to fit the parameters that have been obtained in much more 
recent and objective research in visual perception (125, 223). 
And althouth Spottiswoode's prescriptions have been formulated 
for motion pictures, it should not be difficult to apply them 
to video (to which they should, in fact, be even more important 
than they are to film). 
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(4.1.1) Sensory determinants of cutting rate 


Spottiswoode asserts that the pure fact of sensory 
change per se, regardless of the content of the change, main- 
tains visual interest or arousal (which he calls affective 
tone). When a change occurs, it takes 200 msec to register 
that the eye has received a new view. Arousal then arises 
rapidly, and falls somewhat more slowly than it rises. If the 
next change is made just when the increment that was produced 
by the previous change has dissipated completely and the viewer's 
arousal returns to zero, no overall increase in arousal, from 
one cut to another, will occur. With a shorter delay between 
changes, there will be a net increase in arousal. A progress- 
ively accelerated cutting rate will be required to maintain that 
level, however, because the viewer comes to expect the changes, 
and they therefore produce smaller increments of "surprise" or 
arousal. The greater the sensory changes, the greater the in- 
crement, and what we mean by sensory "change" here is probably 
the abrupt occurrence of gross differences in the distributions 
of light and shade. Therefore, changes from one equal-flux 
display to another; between displays that have homogeneous dis- 
tributions of lightness; and between displays whose masses of 
light and shade are very similar -- all of these transitions will 
be low in arousal value, even though the successive views may 
change greatly in meaning (e.g., from a scene in a forest to 
a scene in a factory). 


Marked sensory changes are therefore changes of view 
which do not depend on the viewer's perception that the meaning 
or content has changed in order for him to detect that changes 
have occurred. Films or video sequences that are designed to 
maintain and manipulate the arousal effect of such sensory 
changes should be effective with any audience, regardless of the 
viewer's culture and knowledge (cf. p. 15). But substantive con- 
tent interacts with these sensory determinants, and must, in 
general, be taken into account. 


(4.1.2) Substantive determinants of cutting rate 
(comprehension time) 


Because of its simplicity, its general familiarity, or 
its expectedness, a view may need (and should receive) only a 
very brief exposure (say 500 msec). A more informative, un- 
expected or complex shot will take more time to comprehend, and 
should therefore be able to sustain a slower cutting rate without 
loss of interest: each shot can last longer on the screen before 
the viewer "gets bored." According to these assumptions, then, 
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there is a trade-off between several factors whenever a view 
changes: To the degree that the two views are dissimilar in 
sensory distribution and similar in content, the time needed to 
understand the transition between them will be short, and the 
filmmaker or video director can switch from one view to the next 
without confusing the viewer. If a sequence of such easy transi- 
tions is presented, the changes can therefore occur with a 
relatively high frequency (e.g., 500 msec per shot or 0.5 Hz). 
But because the viewer will eventually expect each change to 
occur, the rate will have to be speeded up, or the transitions 
will have to be changed in some other way, in order to obtain 
from them the same increments in arousal, and to maintain the 
same degree of visual interest or momentum. By making the con- 
tent less easy to comprehend within a single glance, we can thus 
increase the time that can elapse before the subject tires of 
each view, and the same degree of visual interest can presumably 
be maintained at a slower cutting rate. 


There are a number of interesting relationships that one 
can elaborate from the simple functions that Spottiswoode pro- 
poses (including the implication that one could presumably 


measure on-line comprehension rate by determining the cutting 


rate that maintains a, steady level of visual interest -- surely 
a very important tool, if true, because obtaining a measure of 


comprehension time must be given a high priority for a number 

of theoretical and practical reasons). And research tests of 
those relationships seem relatively easy to perform, if suitable 
measures of visual interest can be found. (E.g., does the time 
required to comprehend each individual view determine the rate 

at which the views can be presented in succession? Does visual 
interest require an accelerating cutting rate, or can a relatively 
simple -- or random -- sequence of rate-cutting maintain visual 
interest?) 


At present, there is no research base for answering such 
questions: we have only intuition and speculation on these 
matters, so further spelling out of the inplications of Spottis- 
woode's proposals is premature. 


But note that this is an important line of research to 
initiate for video purposes. Also note that cutting rate cannot 
be prescribed without knowledge of transition-comprehensibility 
Comprehensibility is itself not merely an empiral question, to 
be determined by testing the comprehensibility of a particular 
set of views: there are some perceptual determinants of transi- 
tion-comprehensibility that we can identify in principle 





be a ae 
seeks : 
Shy . 
Pons Fa 
Mi tigsedt 





= Sip os 


that appear in both views (i.e., identifiable areas appear with 
some displacement before and after the disjunctive transition); 
and as long as the displacement between those masses of dark and 
light, from one view to the next, is not too great. These asser- 
tions are made on the assumption that something like Johansson's 
vector-extraction explanation of "optical proprioception" is 
correct and is applicable to discontinuous as well as to con- 
tinuous transitions (140, 141). Research is needed to determine 
whether in fact that assumption is correct. But even if the 
assumption is correct, it does not mean that comprehension of 
Type II transitions is always achieved rapidly (or that.it is 
achieved at all). If the overlap between views is small; if the 
scene lacks readily-identifiable masses that appear in both views 
(which is frequently the case in video, because of size considera- 
tions -- cf. p. 5/); if the displacement of those masses from 
one view to the next is too great (e.g., in "stop action," there 
may be some separation between the places in which an object 
appears in successive views that cannot be exceeded without 
destroying the intelligibility of the movement being depicted) -- 
in these cases, comprehension should be impaired. More impor- 
tant, although in real life the succession of views that the 

eye receives will usually have a great deal of overlap between 
views of the same object, the artificial juxtaposition of views 
that is possible in motion pictures in general, and that is 
particularly likely in experimental video, may bring Type I 

and Type II transitions into conflict with each other, to the 
detriment of the Type II transition; we consider this question 
in 4.2.2, below. In any case, the study of these factors seems 
straightforward and fruitful. 


Type III transitions are changes of view which could 
not be produced by any action that the observer couls take in 
real time or real space, and which are not possible within a 
single scene (or "long shot''). Motion pictures would be very 
different if only Type I and Type II transitions were used, and 
the fact that video displays must rely heavily on closeups, and 
that they contain little peripheral information (and hence little 
overlap from view to view) means that in video sequences even 
Type II transitions act very much like Type III transitions. 
We have proposed above that Type II transitions (i.e., abrupt 
changes of view from a single standpoint or camera angle) may 
simulate the effects of rapid attentional saccades, and may 
draw on the same cognitive processes that are used to make sense 
out of saccadic glances. To the degree that this is true, the 
systematic study of Type II transitions is not only useful in 
achieving more precise rules for visual communication -- it is 
also an important tool for the study of perceptual attention. 
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It is important, therefore, to identify the processes that 

are responsible for the effectiveness of the filmmaker's 
various techniques, and not to succumb too readily to the 
temptation to consider those techniques to be merely arbitrary 
conventions, invested by (and at the disposal of) the film- 
maker. But there remains one type of transition in common use 
that is arbitrary: Films regularly contain transitions that 
have no internal stimulus information to connect them to each 
other: e.g., successions of views taken from different scenes 
(and/or at different times). The classic montage then becomes 
the temporal equivalent of a collage: A scene of the Golden 
Gate Bridge, of the Empire State Building, of the Eiffel Tower, 
in a sequence of three disjunctive shots, contains within the 
Sequence no intrinsic information that can reveal the spatial 
or temporal relationship between the three views. Here is 
where the viewer's knowledge, culture, and some body of arbi- 
trary filmic convention are needed to achieve comprehension: 
In this example, the first thing that is needed is, of course, 
the viewer's knowledge that the film has been constructed 
“according to some deliberate purpose of the filmmaker, however 
obscure or ill-conceived, and that therefore some connection 
between the views is to be searched for; second, the viewer must 
have some knowledge af what these landmarks are, and of their 
relative geography; third, he must understand any arbitrary 
signals that may be used, (such as street signs, directional 
arrows, calendar leafs, etc.) which were either established by 
the filmmaker in the film itself, or by other filmmakers in 
the history of the medium. We don't know anything about these 
"literary" components of film making, in any scientific sense, 
and it is not clear that there are meaningful research ques- 
tions to be asked about them until we learn a great deal more 
about the more perceptual components, and about the relation- 
ships between the Type I, II and III cuts. We consider this 
last point below. 


(4.2.2) The interaction (and relative strengths) of 


Type I, II, and III determinants 


Smooth or continuous apparent movement that occurs be- 
tween views in which object's contours have been displaced by a 
small amount (Type I transition). ‘The determinants of such 
apparent movement are probably most overlearned, and most ef- 
fective. We encounter sudden large displacements less frequently 
as a result of objects’ motions, and more frequently the result 
of our own saccadic eyemovements. In fact, the discontinuous 
displacement of contours in two successive views is probably 
one of the signals to the perceptual system that a saccade has 
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has occurred. There is some reason to believe (204) that 
exploratory eyemovements 'freeze'' following any abrupt dis- 
placement between successive views, suspending further glances 
until the viewer compreh nds the transition that has occurred. 
And conversely, if major contours (and/or masses of light and 
dark seen in peripheral vision) fall in essentially the same 
places at two different moments in time, that fact should be 

a signal to the oculomotor system that no eyemovement or other 
change in viewpoint has occurred. Similarly, if there is on- 
going motion in one shot, there must be motion in the second 
shot as well, and the rate of motion in the second shot should 
be closely matched to that of the first shot or the discrep- 
ancies between velocities will act as a signal that a gross 
view-change has occurred. 


Filmmakers often maintain "smoothness" of cutting when 
showing different objects in successive views, by placing those 
objects so that their contours fall on the same places on the 
screen in the two views (i.e., so that no large abrupt dis- 
placement of contour occurs). And similarly, if the two suc- 
cessive views show quite different objects in motion, '"smooth- 
ness'' may be achieved by making sure that the velocities of the 
two different objects, match when they are shown in the two dis- 
continuous successive views. Such matching of contours and of 
velocities should minimize the "change" signals that the visual 
system will normally receive from discontinuous cut. But while 
this suppression of change-signals should minimize "freezing" 
of eyemovements and other low-level results of visual disorienta- 
tion during the disjunctive transition, it may (at least the- 
oretically) be very misleading as well, and impair comprehension 
in several ways, precisely because a major transition in views 
has occurred but the visual system has not been alerted to that 
fact. 


The determinants of Type II transitions (transitions 
which simulate saccades and other changes of gaze) are also 
likely to be overlearned, and to act strongly and rapidly. 

They should easily overcome the determinants of Type III transi- 
tions. Type II transitions should be rapidly comprehended 
(although there appears to be an inherent limitation of about 
200 msec for minimum processing time). The comprehensibility 
of Type II transitions may be impaired by interference from 
Type I determinants, however. For example, if the displacement 
between two views is such that one object's contours fall on or 
next to the place previously occupied by the contours of a dif- 
ferent object (Fig. 5), the second object may be interpreted by 
a low level of the perceptual nervous system as being the object 
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shown in the first view. When this happens, a misleading or 
inconsistent perception of the relative direction of the two 
views may occur, and this would retard or impair comprehension 
of the spatial meaning of the scene that is to be represented by 
the sequence of views. Moreover, if the scene from which the 
views are being taken lacks distributed masses of light and dark, 
so that each view that the observer receives contains no strong 
contours that are peripherally detectible, and that serve to 
Signal the direction in which each view is displaced in relation 
to the preceding view (i.e., that serve to indicate which way 
the camera has moved), then it should theoretically take longer 
to comprehend a transition (and more errors should be made in 
comprehending it). Research here seems relatively straight- 
forward to design and execute. 


In summary, it is proposed that the following relation- 
ship obtains between the different kinds of transition both in 
terms of strength (when two sets of determinants are put into 
conflict) and in terms of speed of comprehension: 


Type I — Type IL > _ Type III 


\ 


This hypothesis can (if it is correct) generate a number 
of rules that should be followed in making transitions between 
views. These rules explain, as special cases of the general per- 
ceptual and cognitive principles involved, those "cutting rules" 
that have been stated by filmmakers, in print or in conversation. 
To the degree that they are objectively and quantitatively stated 
and established, these rules would permit us to improve (or im- 
pair) comprehensibility, at will. And to the degree that con- 
prehension-time is an important factor in determining what 
cutting rate should be (and what cutting rhythm will be like), 
these rules should be important to the study of cutting rate as 
well. They should apply to video as well as to theater motion 
pictures. There are special features of TV which must be taken 
into account in any attempt at application, however, and we con- 
Sider these factors next. 


(4.2.3) Size and acuity limits of video and their 


effects on transition interactions 


It has been argued above (p. 51) that change-of-view is 
important to interest-maintenance and to affective response. 
Of the three types of transition, continuous motion (Type I 
transition) is clearly available to the TV display, and is ex- 
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ploited in the dollies (and quasi-dollies in which two cameras 
participate in what is essentially one movement) and tracking 
shots. Type II transitions are more difficult to use in the 
usual TV display because of the small screen and poor resolu- 
tion and because little peripheral material can be provided, 
closeups usually prevail in the views between which transitions 
are being made, and there cannot be a great deal of overlap be- 
tween successive closeups. In many cases, and especially in talk 
shows in which a long shot establishes the spatial locations of 
a succession of closeups of participants who are not going to 
move around much for the remainder of the presentation, . the 
knowledge of who is sitting where will maintain the relative 
direction of shots from one transition to the next. But the 
"talking heads" arrangement probably works because our percep- 
tual memory for faces is exceptionally good (130), because 
relative position in not as important to the viewer as who is 
speaking, and because social convention (e.g., that a speaker 
faces the one he is addressing) helps maintain spatial compre- 
hensibility. Otherwise, the absence of overlap and the small 
size of the screen probably make all transitions act the way 
Type III transitions work in large-screen cinema: i.e., the 
viewer requires heavy doses of filmic convention and of non- 
visual information in order to relate one view to the next. 

For the same reasons, much lower base rates are probably needed 
to maintain a given degree of visual interest in TV compared to 
large screen cinema. Research on the effect of screen size (and 
of resolution limits) on transition-comprehension and on cutting 
rate and rhythm would seem to be essential; we know of no re- 
search that has been performed to obtain the necessary data. 


To some degree, the TV viewer can choose his own "Screen 
size," in the sense that he can increase the visual angle that 
the screen subtends at his eye by moving closer to the screen. 
When this is done, the poor detail of the display becomes ob- 
trusive, and peculiar contour effects are probably generated 
(see p. 28), but in return the viewer obtains a display that is | 
large enough to provide him with peripheral vision, a factor that 
is probably important to the comprehension of Type I and Type II 
transitions. There are two inherent limitations, however, on 
the viewer's ability to increase TV screen contribution to his 
peripheral vision by decreasing his viewing distance: 


(1) The first,is the obvious fact that the resolution 
of the screen is so poor that reducing the viewer's distance 
from the screen does not provide him with any additional de- 
tailed information. What is worse, it may impair his grasp of 


detail by increasing the degrading effects of the raster in 
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foveal vision, which, of course, is the only part of our vision 
that is capable of using the detail to build up a mental pic- 
ture, in fine detail, of the scene that is being represented. 


(2) The second is the somewhat less obvious fact that 
a 12 inch screen viewed from a distance of 5 feet probably does 
not have exactly the same effects on the visual system as does 
a 24 inch screen at 10 feet. Even if the two screens were 
viewed in a totally dark room so that there were no visual in- 
dications of scale and distance, there are still certain firm 
corollaries of the actual sizes and distances that are built 
into the geometry of the head-movements and eyemovements that 
underlie our perceptions of individual shots and our comprehen- 
sions of sequences of shots. For example, with eyes stationary 
in the head, larger head movements would be needed to scan the 
near screen than the far one (this situation is one in which the 
subject changes the direction of his gaze without moving his 
eyes, an artificial condition but one which illustrates the 
relationship being described), even though the same eyemovement 
would scan the two screens if the head is fixed and the eyes 
were free to move. If one determinant of how large an object 
looks is the relationship of eyemovements to head movements 
needed to scan it, the small screen will appear small and the 
large screen will appear large no matter where the viewer sits. 
And there are other oculomotor adjustments that will work to 
maintain this "size constancy independent of viewing distance": 
E.g., as the lens of the eye adjusts its focus to a nearer dis- 
tance, the apparent size of an object decreases, even though 
the visual angle is unchanged. 


Thus, a small screen may be perceived as a small screen, 
even when the viewer is close to it. On the other hand, if the 
screen size is doubled, and is viewed from twice the distance, 
the screen continues to subtend the same visual angles: Small 
screen at near distance, and large screen at far distance, 
then produce the same visual angle and the same retinal image. 
The following complicated situation then confronts the visuo- 
motor system: An object -- say, a man 6 feet in height -- is 
represented by an eight inch image on a 12 inch screen, viewed 
from a distance of 5 feet; he is also represented by a 16 inch 
image on a 24 inch screen, viewed from a distance of 10 feet. 
The man's image subtends a visual angle of arctan 0.13 or ap- 
proximately 7° , so an eyemovement of 7° would scan him from 
head to toe. A real man, 6 feet tall, would also subtend that 
visual angle and would be scanned by a 7° glance, if he stood 
about 46 feet away from the viewer. All of this presupposes 
a stationary head and a moving eye. If the viewer makes head 
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or body movements, the situation changes drastically: With 
the screen at 5 feet, the viewer would have to move his head 

8 inches (if he did not move his eyes at all) in order to scan 
the 8 inch image of the man; with the screen at 10 feet, he 
would have to move his head 6 feet! It is clear that the co- 
ordination of head and eyemovements cannot be based on the 
viewer's expectations about men, as objects, or those movements 
would be inappropriate to the actual size and distance of the 
screen. But it is also likely that there is some effect of 
such expectations (cf. 124: pp. 495ff; 510; 544), and that the 
real size of the man's image (8 inches or 16 inches), the an- 
gular size of the image (7%) and the familiar and expected 
objective size of the man (6 feet) all must interact in deter- 
mining the eyemovements that the viewer is prepared to make 
and the speed with which the visuomotor system "decides" how 
to respond to such conflicting information. If conflicting 
information about size causes the eye to take longer to "un- 
freeze" after each transition or cut, then a given cutting 
rate may not have the same effects on comprehensibility (p. 53), 
on arousal (p. 13, 51), and even on conditioned affective or 
emotional concomitants (p. 12,13), as would the same cutting 
rate without size conflicts. 





Viewing distance, subject matter, and screen size all 
may interact therefore, in a complex but lawful way to deter- 
mine the ways in which cutting rate effects comprehensibility, 
interest-maintenance, and affect, in TV viewing. Whether or 
not full scale research would be fruitful depends not only on 
the existence of such interactions, but on whether they are 
large enough and reliable enough to be important. We know of 
no research on how screen size and viewing distance affect 
response to video displays. Preliminary research is needed to 
determine whether strong effects of this kind exist, using com- 
prehensibility, attention, and ratings as dependent variables; 
if they do, more sophisticated physiological and psychophysical 
measures (cf. pp. 7,12) should certainly be applied to this 
study: If screen size/viewing distance is an important factor 
at all, it is probably a very important one indeed. 
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FIGURE 1: Visual angle as a measure of stimulus size: 
Objects A and B subtend the same angle at the 
eye although they are of different physical size 
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FIGURE 2: Visual angles sub- : 
tended by TV display and ~~ \ Motion 
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FIGURE 3: By measuring the time course of 
a reSponse over n repetitions, 
and obtaining the average measure 
n at each point in time the acci- 
dental perturbations can be re- 
moved to reveal the underlying 


transients. 
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FIGURE 5: Succession involving 
conflict between Type 
I and Type II deter- 
minants 
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